Individual UCEs

The gene trees were constructed using ETE toolkit:

Preparatory step: installing Miniconda, following the instructions from http://etetoolkit.org/download/

The gene trees were estimated using RAxML:

ete3 build -w standard_raxml -n /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA/uce-3.fasta -o /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA/uce-3-tree

Automatically generate the corresponding command for each of the 230 alignment files in the directory:

cd /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA
ls

Copy the output to a text file:

# Get the 1st part of the command

locus_table <- read.table("Set1-individual-loci.txt")
n <- length(unlist(locus_table))

inputfile <- vector()
for(i in locus_table) {
  inputfile <- append(inputfile, paste("ete3 build -w standard_raxml -n /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA/", i, sep = ""))
}

# Length check

length(inputfile) - n

# Remove file endings from the locus names

names_and_endings <- as.character(as.vector(as.matrix(locus_table)))
locus_names <- vector()
for(i in names_and_endings) {
  locus_names <- append(locus_names, sapply(strsplit(i, split='.', fixed=TRUE), function(x) (x[1])))
}

# Get the 2nd part of the command

outputfile <- vector()
for(i in locus_names) {
  outputfile <- append(outputfile, paste(" -o /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/", i, sep=""))
}

# Length check

length(outputfile) - n

# Get the entire command

commands <- paste(inputfile, outputfile, "-tree &&", sep="")

# Length check

length(commands) - n

# Uncomment to print to file:
# write(commands, "gene-tree-analysis.sh")

#!/bin/bash was then inserted into the first line of the file to convert it into a shell script. The script was run as follows:

chmod 755 /Users/David/Grive/Alfaro_Lab/SortaDate/Locus_analysis/gene-tree-analysis.sh
/Users/David/Grive/Alfaro_Lab/SortaDate/Locus_analysis/gene-tree-analysis.sh

Write a new script that will copy the trees from the nested subdirectories to the same directory where the alignment files are located:

# 1st part of the command

copyfiles <- vector()
for(i in locus_names) {
  copyfiles <- append(copyfiles, paste("cp /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/", i, sep = ""))
}

# 2nd part of the command

subdirectories <- vector()
for(i in locus_names) {
  subdirectories <- append(subdirectories, paste("-tree/clustalo_default-none-none-raxml_default/", i, sep = ""))
}

# Get the entire command

copying <- paste(copyfiles, subdirectories, ".fasta.final_tree.nw /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA/ &&", sep="")

# Uncomment to print to file:
# write(copying, "copy-trees.sh")

Write a third script to rename the tree files so that they have the .tre file ending, which SortaDate looks for while searching its target directory:

# 1st part of the command

oldname <- vector()
for(i in locus_names) {
  oldname <- append(oldname, paste("mv /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA/", i, sep = ""))
}

# 2nd part of the command

newname <- vector()
for(i in locus_names) {
  newname <- append(newname, paste(".fasta.final_tree.nw /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA/", i, sep = ""))
}

# Get the entire command:

renametrees <- paste(oldname, newname, ".tre &&", sep = "")

# Uncomment to print to file:
write(renametrees, "rename-trees.sh")

Phase 1: get_var_length

python src/get_var_length.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA/ --flend .tre --outf Locus_analysis/var-uces --outg alepisaurus_ferox,ceratoscopelus_warmingii

Phase 2: get_bp_genetrees

python src/get_bp_genetrees.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA/ /Users/David/Grive/Alfaro_Lab/Acanthomorpha/12_no_outgroups_scheme_3.tre --flend .tre --outf Locus_analysis/bp-uces

Phase 3: combine_results

python src/combine_results.py Locus_analysis/var-uces Locus_analysis/bp-uces --outf Locus_analysis/comb-uces

Phase 4: get_good_genes

In order of descending priority: bipartition support, root-to-tip variance, tree length

python src/get_good_genes.py Locus_analysis/comb-uces --max 3 --order 3,1,2 --outf Locus_analysis/gg-uces-312
name root-to-tip_var treelength bipartition
uce-1184.tre 0.00510364 5.59114 0.491379310345
uce-383.tre 0.0094896 4.68126 0.465517241379
uce-1317.tre 0.0200846 8.06488 0.465517241379

In order of descending priority: bipartition support, tree length, root-to-tip variance

python src/get_good_genes.py Locus_analysis/comb-uces --max 3 --order 3,2,1 --outf Locus_analysis/gg-uces-321
name root-to-tip_var treelength bipartition
uce-1184.tre 0.00510364 5.59114 0.491379310345
uce-383.tre 0.0094896 4.68126 0.465517241379
uce-1317.tre 0.0200846 8.06488 0.465517241379

In order of descending priority: root-to-tip variance, tree length, bipartition support

python src/get_good_genes.py Locus_analysis/comb-uces --max 3 --order 1,2,3 --outf Locus_analysis/gg-uces-123
name root-to-tip_var treelength bipartition
uce-1075.tre 7.59021e-05 0.363795 0.0689655172414
uce-446.tre 0.000230387 0.897702 0.26724137931
uce-855.tre 0.000309407 1.03296 0.301724137931

In order of descending priority: root-to-tip variance, bipartition support, tree length

python src/get_good_genes.py Locus_analysis/comb-uces --max 3 --order 1,3,2 --outf Locus_analysis/gg-uces-132
name root-to-tip_var treelength bipartition
uce-1075.tre 7.59021e-05 0.363795 0.0689655172414
uce-446.tre 0.000230387 0.897702 0.26724137931
uce-855.tre 0.000309407 1.03296 0.301724137931

In order of descending priority: tree length, root-to-tip variance, bipartition support

python src/get_good_genes.py Locus_analysis/comb-uces --max 3 --order 2,1,3 --outf Locus_analysis/gg-uces-213
name root-to-tip_var treelength bipartition
uce-1075.tre 7.59021e-05 0.363795 0.0689655172414
uce-737.tre 0.000743068 0.814482 0.137931034483
uce-446.tre 0.000230387 0.897702 0.26724137931

In order of descending priority: tree length, bipartition support, root-to-tip variance

python src/get_good_genes.py Locus_analysis/comb-uces --max 3 --order 2,3,1 --outf Locus_analysis/gg-uces-231
name root-to-tip_var treelength bipartition
uce-1075.tre 7.59021e-05 0.363795 0.0689655172414
uce-737.tre 0.000743068 0.814482 0.137931034483
uce-446.tre 0.000230387 0.897702 0.26724137931

k-means partitions

Note that the largest, slowest-evolving partition (“ccf55a6ee6d62f840a124bcc0c98ecf5”; 132 kb) was excluded from the first round of analyses for computational reasons. Attempts to analyze it in RAxML after the remaining 31 analyses finished up were unsuccessful.

# Get the 1st part of the command

kmeans_table <- read.table("Set2-kmeans-partitions.txt")
n2 <- length(unlist(kmeans_table))

first <- vector()
for(i in kmeans_table) {
  first <- append(first, paste("ete3 build -w standard_raxml -n /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/", i, sep = ""))
}

# Length check

length(first) - n2

# Remove file endings from the locus names

no_endings <- as.character(as.vector(as.matrix(kmeans_table)))
kmeans_names <- vector()
for(i in no_endings) {
  kmeans_names <- append(kmeans_names, sapply(strsplit(i, split='.', fixed=TRUE), function(x) (x[1])))
}
kmeans_names

# Get the 2nd part of the command

second <- vector()
for(i in kmeans_names) {
  second <- append(second, paste(" -o /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/", i, sep=""))
}
second

# Length check

length(second) - n2

# Get the entire command

kmeansscript <- paste(first, second, "-tree &&", sep="")

# Length check

length(kmeansscript) - n2

# Uncomment to print to file:
# write(kmeansscript, "kmeans-analysis.sh")

Copy the trees into the directory containing the alignments:

# 1st part of the command

copytrees <- vector()
for(i in kmeans_names) {
  copytrees <- append(copytrees, paste("cp /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/", i, sep = ""))
}

# 2nd part of the command

locations <- vector()
for(i in kmeans_names) {
  locations <- append(locations, paste("-tree/clustalo_default-none-none-raxml_default/", i, sep = ""))
}

# Get the entire command

finalstep <- paste(copytrees, locations, ".phy.fasta.final_tree.nw /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/FASTA/ &&", sep="")

# Uncomment to print to file:
# write(finalstep, "copy-kmeans-trees.sh")

Change the tree names so that they correspond to the partition names:

# 1st part of the command

oldtreename <- vector()
for(i in kmeans_names) {
  oldtreename <- append(oldtreename, paste("mv /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/FASTA/", i, sep = ""))
}

# 2nd part of the command

newtreename <- vector()
for(i in kmeans_names) {
  newtreename <- append(newtreename, paste(".phy.fasta.final_tree.nw /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/FASTA/", i, sep = ""))
}

# Get the entire command:

changetreenames <- paste(oldtreename, newtreename, ".tre &&", sep = "")

# Uncomment to print to file:
# write(changetreenames, "rename-kmeans-trees.sh")

Finally, rename the partitions:

# 1st part of the command

oldpartition <- vector()
for(i in kmeans_names) {
  oldpartition <- append(oldpartition, paste("mv /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/FASTA/", i, sep = ""))
}

# 2nd part of the command

newpartition <- vector()
for(i in kmeans_names) {
  newpartition <- append(newpartition, paste(".phy.fasta /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/FASTA/", i, sep = ""))
}

# Get the entire command
  
renamepartitions <- paste(oldpartition, newpartition, ".fasta &&", sep = "")

# Uncomment to print to file:
# write(renamepartitions, "rename-partitions.sh")

Phase 1: get_var_length

python src/get_var_length.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/FASTA/ --flend .tre --outf var-kmeans --outg alepisaurus_ferox,ceratoscopelus_warmingii

Phase 2: get_bp_genetrees

python src/get_bp_genetrees.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/k-means_partitions/FASTA/ /Users/David/Grive/Alfaro_Lab/Acanthomorpha/12_no_outgroups_scheme_3.tre --flend .tre --outf bp-kmeans

Phase 3: combine_results

python src/combine_results.py var-kmeans bp-kmeans --outf comb-kmeans

Phase 4: get_good_genes

In order of descending priority: bipartition support, root-to-tip variance, tree length

name root-to-tip_var treelength bipartition
8be7c94dcf2d71970c663f6710af40d7.tre 0.0878182 19.8649 0.560344827586
f495e4e0f2f9bbbf091f778067b062f4.tre 0.0230499 11.0301 0.508620689655
0061a6137d1029978876fe13239f57bc.tre 0.0205424 11.2184 0.5

In order of descending priority: bipartition support, tree length, root-to-tip variance

name root-to-tip_var treelength bipartition
8be7c94dcf2d71970c663f6710af40d7.tre 0.0878182 19.8649 0.560344827586
f495e4e0f2f9bbbf091f778067b062f4.tre 0.0230499 11.0301 0.508620689655
0061a6137d1029978876fe13239f57bc.tre 0.0205424 11.2184 0.5

In order of descending priority: root-to-tip variance, tree length, bipartition support

name root-to-tip_var treelength bipartition
4ff6dce0b7cebc9428b4b77e79356484.tre 0.000241636 1.15434 0.0258620689655
589d12bf910a9937d941aa4c836ad87d.tre 0.000935835 2.0223 0.336206896552
fe7aa5d9de0f5f51ef6365ef3b7f1e06.tre 0.00125304 2.60574 0.0603448275862

In order of descending priority: root-to-tip variance, bipartition support, tree length

name root-to-tip_var treelength bipartition
4ff6dce0b7cebc9428b4b77e79356484.tre 0.000241636 1.15434 0.0258620689655
589d12bf910a9937d941aa4c836ad87d.tre 0.000935835 2.0223 0.336206896552
fe7aa5d9de0f5f51ef6365ef3b7f1e06.tre 0.00125304 2.60574 0.0603448275862

In order of descending priority: tree length, root-to-tip variance, bipartition support

name root-to-tip_var treelength bipartition
4ff6dce0b7cebc9428b4b77e79356484.tre 0.000241636 1.15434 0.0258620689655
589d12bf910a9937d941aa4c836ad87d.tre 0.000935835 2.0223 0.336206896552
fe7aa5d9de0f5f51ef6365ef3b7f1e06.tre 0.00125304 2.60574 0.0603448275862

In order of descending priority: tree length, bipartition support, root-to-tip variance

name root-to-tip_var treelength bipartition
4ff6dce0b7cebc9428b4b77e79356484.tre 0.000241636 1.15434 0.0258620689655
589d12bf910a9937d941aa4c836ad87d.tre 0.000935835 2.0223 0.336206896552
fe7aa5d9de0f5f51ef6365ef3b7f1e06.tre 0.00125304 2.60574 0.0603448275862

50-bp Chunks

A bash script was written to automate the following actions:

  1. Create a new directory nested in /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/FASTA for each locus
  2. Split the alignments for each locus into individual sequences, and these into 50-bp chunks. This step was performed using a custom Python script obtained from http://www.reddit.com/r/bioinformatics/comments/1u8yc7/looking_for_a_script_that_will_split_dna/ceg8rav/?st=j0tbjfco&sh=1e300055.

      '''
      Splits all sequences within a multi-fasta file into chunks of a specified size.
    
      Fasta header information is retained with each split sequence - its position in
      the original is appended to the id. Single-line and multi-line fasta files are
      supported. Prints to stout, so pipe to a file to store the result.
    
      Usage:
      python splitter.py <filename> <chunksize>
      python splitter.py myfile.fa 100
      '''
    
      from __future__ import print_function
      from sys import argv, version
    
      if version[0] == '2':
          from itertools import izip_longest as zl
      else:
          from itertools import zip_longest as zl
    
      chunksize = int(argv[2])
    
      def writeseq(header, seq):
          for i, chunk in enumerate(zl(*[iter(seq)]*chunksize, fillvalue='')):
              print(header + '_{}bp'.format(i*chunksize))
              print(''.join(chunk))
    
      with open(argv[1]) as f:
          header, seq = f.readline().rstrip(), ''
          for l in f:
              if l[0] != '>':
                  seq += l.rstrip()
              else:
                  writeseq(header, seq)
                  header, seq = l.rstrip(), ''
          writeseq(header, seq)
  3. For each locus, join the individual sequences chunk-wise (i.e., make a single fasta file for all taxa and sites 0 to 50, another one for all taxa and sites 51 to 100, etc.):

    library(dplyr)
    
    # Alternating rows (name, sequence, name, sequence) go to two different columns, so that
    # each sequence is correctly assigned to its respective taxon:
    
    split_seqs <- read.table("split.txt")
    odd <- as.vector(split_seqs[seq(1, nrow(split_seqs), 2), ])
    even <- as.vector(split_seqs[seq(2, nrow(split_seqs), 2), ])
    odd_name <- "taxon"
    even_name <- "sequence"
    split_seqs_new <- data.frame(odd, even)
    names(split_seqs_new) <- c(odd_name, even_name)
    
    # Determine how long the locus is (i.e., how many 50-bp chunks it has been split into).
    # This can be done by counting the number of occurrences of a single taxon name. 
    # In principle, any name could be used, but since not all of the UCEs include
    # all of the taxa, it is advisable to choose a taxon common to all the loci.
    
    n <- length(unique(grep("chaetodon_kleinii", split_seqs_new[,1], value = TRUE)))
    
    # Create a vector of strings that can filter taxon names according to the base pair range
    # tag attached to their end
    
    chunks <- vector()
    for(i in 0:(n-1)) {
      chunks <- append(chunks, paste("_", i*50, "bp", sep = ""))
    }
    
    # Create a list of data frames. Each element of the list represents a base pair range
    # and consists of a data frame containing both the "taxon" and "sequence" columns of
    # split_seqs_new
    
    partition <- list()
    for(i in 1:length(chunks)) {
      partition[[i]] <- data.frame(filter(split_seqs_new, 
                                          grepl(as.character(chunks[i]), taxon)))
    }
    
    # Create a matrix whose columns represents individual chunks (i.e., base pair ranges)
    # and whose rows have the structure of the original split_seqs data frame -- i.e., name,
    # sequence, name, sequence:
    
    chunkmatrix <- matrix(ncol = length(partition), 
                          nrow = 2*(nrow(split_seqs_new)/length(partition)))
    for(i in 1:length(partition)) {
      for(j in 1:nrow(partition[[i]])) {
        chunkmatrix[(2*j - 1), i] <- as.character(partition[[i]][j, "taxon"])
        chunkmatrix[2*j, i] <- as.character(partition[[i]][j, "sequence"])
      }
    }
    
    # Print the resulting fasta files!
    
    for(i in 1:ncol(chunkmatrix)) {
      write(chunkmatrix[,i], paste("chunk_", i, ".fasta", sep = ""))
    }
  4. Add a locus-indicating prefix to all the chunks of a given UCE:

    find *.fasta -maxdepth 0 ! -path . -exec mv {} uce-1005_{} \;
  5. Copy the resulting fasta files into a single directory.

The contents of the directory were then summarized as follows:

cd /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/
ls > /Users/David/Grive/Alfaro_Lab/SortaDate/Set3-50bp-chunks.txt

Now, change the taxon names in the chunk FASTA files so that they correspond to the names in the reference tree. First, create a file with the names of all the chunk files:

cd /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/
ls *.fasta > /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/chunklist.txt
a <- read.table("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/chunklist.txt")

x <- vector()
for(i in 1:nrow(a)) {
  x <- append(x, paste("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/", a[i,], sep = ""))
}

for(i in 1:length(x)) {
  c <- read.table(print(x[i]), stringsAsFactors = FALSE)
  d <- vector()
  for(j in 1:(nrow(c)/2)) {
    d[j] <- as.character(c[(2*j-1),])
  }
  
  e <- vector()
  for(j in 1:length(d)) {
    e[j] <- paste(sapply(strsplit(d[j], split="_", fixed=TRUE), function(x) (x[1])), 
                  "_", 
                  sapply(strsplit(d[j], split="_", fixed=TRUE), function(x) (x[2])),
                  sep = "")
  }
  
  for(j in 1:(nrow(c)/2)) {
    c[(2*j-1),] <- e[j]
  }
  write(as.matrix(c), print(x[i]), ncolumns=1)
}

A script was generated to analyze all of the alignment in the directory using RAxML:

chunk_table <- read.table("Set3-50bp-chunks.txt")

newfolders <- vector()
for(i in chunk_table) {
  newfolders <- append(newfolders, paste("ete3 build -w standard_raxml -n /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/", i, sep=""))
}

with_endings <- as.character(as.vector(as.matrix(chunk_table)))
chunk_names <- vector()
for(i in with_endings) {
  chunk_names <- append(chunk_names, sapply(strsplit(i, split='.', fixed=TRUE), function(x) (x[1])))
}

tree_location <- vector()
for(i in chunk_names) {
  tree_location <- append(tree_location, paste(" -o /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/", i, sep=""))
}

analyzechunks <- paste(newfolders, tree_location, sep="")
write(analyzechunks, "chunk-analysis.sh")

The resulting tree files were then copied into the directory containing the alignments:

copyfrom1 <- vector()
for(i in chunk_names) {
  copyfrom1 <- append(copyfrom1, paste("cp /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/", i, sep = ""))
}

copyfrom2 <- vector()
for(i in chunk_names) {
  copyfrom2 <- append(copyfrom2, paste("/clustalo_default-none-none-raxml_default/", i, sep = ""))
}

copyto <- paste(copyfrom1, copyfrom2, ".fasta.final_tree.nw /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/", sep="")

write(copyto, "copy-chunk-trees.sh")

Rename the trees:

brew install rename
rename -S .fasta.final_tree.nw .tre *.fasta.final_tree.nw

Running ls *.fasta | wc -l and ls *.nw | wc -l in the directory showed that out of 1826 chunk FASTA files, no more than 1070 had tree files associated with them.

Phase 1: get_var_length

python src/get_var_length.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/ --flend .tre --outf Chunk_analysis/var-chunks --outg alepisaurus_ferox,ceratoscopelus_warmingii

Phase 2: get_bp_genetrees

python src/get_bp_genetrees.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/ /Users/David/Grive/Alfaro_Lab/Acanthomorpha/12_no_outgroups_scheme_3.tre --flend .tre --outf Chunk_analysis/bp-chunks

Phase 3: combine_results

python src/combine_results.py Chunk_analysis/var-chunks Chunk_analysis/bp-chunks --outf Chunk_analysis/comb-chunks

Calculate the Robinson-Foulds distances of the individual chunk trees from the reference tree using the Python script below:

import os, uuid
from ete3 import Tree

t2 = Tree("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/12_no_outgroups_scheme_3.tre")
for file in os.listdir("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks/"):
    if file.endswith(".tre"):
        t1 = Tree(file)
        try:
            rf = t1.robinson_foulds(t2)
            print str(file), (rf[0])
        except:
            pass

Step 1 of filtering: remove trees of zero length

No effect: the minimum observed tree length is non-zero and equal to:

## [1] 8.80081e-05

Phase 4: get_good_genes

In order of descending priority: bipartition support, root-to-tip variance, tree length

name root-to-tip_var treelength bipartition
uce-157_chunk_7.tre 0.176994 14.918 0.258620689655
uce-126_chunk_3.tre 0.104056 14.0824 0.224137931034
uce-323_chunk_2.tre 1.16412 48.0104 0.224137931034
## [1] "The Robinson-Foulds distances of the three bets chunks from the reference tree are as follows:"
##                   chunk  rf
## 321 uce-157_chunk_7.tre 164
##                   chunk  rf
## 219 uce-126_chunk_3.tre 166
##                   chunk  rf
## 443 uce-323_chunk_2.tre 168

The chunk with the minimum RF distance from the reference tree:

##                   chunk  rf
## 321 uce-157_chunk_7.tre 164

In order of descending priority: bipartition support, tree length, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-157_chunk_7.tre 0.176994 14.918 0.258620689655
uce-126_chunk_3.tre 0.104056 14.0824 0.224137931034
uce-323_chunk_2.tre 1.16412 48.0104 0.224137931034

In order of descending priority: root-to-tip variance, tree length, bipartition support

name root-to-tip_var treelength bipartition
uce-1243_chunk_4.tre 1.05801e-11 0.0215722 0.0
uce-600_chunk_3.tre 1.28251e-11 0.0205929 0.00862068965517
uce-737_chunk_4.tre 2.16826e-11 0.0215345 0.0

In order of descending priority: root-to-tip variance, bipartition support, tree length

name root-to-tip_var treelength bipartition
uce-1243_chunk_4.tre 1.05801e-11 0.0215722 0.0
uce-600_chunk_3.tre 1.28251e-11 0.0205929 0.00862068965517
uce-737_chunk_4.tre 2.16826e-11 0.0215345 0.0

In order of descending priority: tree length, root-to-tip variance, bipartition support

name root-to-tip_var treelength bipartition
uce-855_chunk_6.tre 3.60737e-11 8.80081e-05 0.0
uce-413_chunk_4.tre 3.51022e-06 0.0205344 0.0
uce-600_chunk_3.tre 1.28251e-11 0.0205929 0.00862068965517

In order of descending priority: tree length, bipartition support, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-855_chunk_6.tre 3.60737e-11 8.80081e-05 0.0
uce-413_chunk_4.tre 3.51022e-06 0.0205344 0.0
uce-600_chunk_3.tre 1.28251e-11 0.0205929 0.00862068965517

50-bp chunks: SH-like of 75% or above

  1. Copy all the tree files to a new directory:

    cd /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa
    mkdir Chunks_75
    cp Chunks/*.tre Chunks_75
  2. Collapse all the nodes with SH-like support values of less than 75% using the following Python script:

    import os, uuid
    from ete3 import Tree
    
    for file in os.listdir("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks"):
        if file.endswith(".tre"):
            outname = "/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks_75/" + str(file)
            t = Tree(file, format=0)
    
            print t.get_ascii(attributes=['support', 'name'])
    
            for node in t.get_descendants():
                if not node.is_leaf() and node.support <= 0.75:
                    node.delete()
    
            print t.get_ascii(attributes=['support', 'name'])
    
            t.write(format=0, outfile=outname)
  3. Copy the fasta alignments to the same directory:

    cp Chunks/*.fasta Chunks_75

Phase 1: get_var_length

python src/get_var_length.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks_75/ --flend .tre --outf Chunk75_analysis/var-chunks75 --outg alepisaurus_ferox,ceratoscopelus_warmingii

Phase 2: get_bp_genetrees

python src/get_bp_genetrees.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks_75/ /Users/David/Grive/Alfaro_Lab/Acanthomorpha/12_no_outgroups_scheme_3.tre --flend .tre --outf Chunk75_analysis/bp-chunks75

Phase 3: combine_results

python src/combine_results.py Chunk75_analysis/var-chunks75 Chunk75_analysis/bp-chunks75 --outf Chunk75_analysis/comb-chunks75

A script was written to delete all lines containing NAs, as well as all lines that only contained an entry for bipartition support but none for root-to-tip branch length variance or tree length. This was accomplished by filling these partially empty lines with NAs in the first step and deleting them in the second step:

combchunks75 <- read.table("/Users/David/Grive/Alfaro_Lab/SortaDate/Chunk75_analysis/comb-chunks75", fill = TRUE)
filtered <- na.omit(combchunks75)
write.table(filtered,
      "/Users/David/Grive/Alfaro_Lab/SortaDate/Chunk75_analysis/comb-chunks75-filtered",
      sep = "\t",
      quote = FALSE,
      row.names = FALSE,
      col.names = FALSE)

The length of the filtered comb file is 901 lines, meaning that SortaDate failed to obtain root-to-tip variance and/or tree length for 169 trees.

Step 1 of filtering: remove trees of zero length

No effect: the minimum observed tree length is non-zero and equal to:

## [1] 0.0202802

Note that this result is orders of magnitude larger than those observed in the other five chunk datasets.

Phase 4: get_good_genes

In order of descending priority: bipartition support, root-to-tip variance, tree length

name root-to-tip_var treelength bipartition
uce-157_chunk_7.tre 0.163848 13.508 0.198275862069
uce-126_chunk_3.tre 0.101031 13.9331 0.181034482759
uce-323_chunk_2.tre 0.459664 40.9379 0.146551724138
## [1] "The Robinson-Foulds distances of the three best chunks from the reference tree are as follows:"
##                   chunk  rf
## 284 uce-157_chunk_7.tre 126
##                   chunk  rf
## 194 uce-126_chunk_3.tre 114
##                   chunk  rf
## 391 uce-323_chunk_2.tre 125

The chunk with the minimum RF distance from the reference tree:

##                    chunk  rf
## 189 uce-1263_chunk_2.tre 106

In order of descending priority: bipartition support, tree length, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-157_chunk_7.tre 0.163848 13.508 0.198275862069
uce-126_chunk_3.tre 0.101031 13.9331 0.181034482759
uce-323_chunk_2.tre 0.459664 40.9379 0.146551724138

In order of descending priority: root-to-tip variance, tree length, bipartition support

name root-to-tip_var treelength bipartition
uce-815_chunk_8.tre 0.0 0.0202802 0.0
uce-359_chunk_7.tre 0.0 0.0203467 0.00862068965517
uce-160_chunk_3.tre 0.0 0.0204413 0.0

In order of descending priority: root-to-tip variance, bipartition support, tree length

name root-to-tip_var treelength bipartition
uce-200_chunk_2.tre 0.0 0.0317025 0.0344827586207
uce-129_chunk_5.tre 0.0 0.0211935 0.0258620689655
uce-1062_chunk_5.tre 0.0 0.113443 0.0258620689655

In order of descending priority: tree length, root-to-tip variance, bipartition support

name root-to-tip_var treelength bipartition
uce-815_chunk_8.tre 0.0 0.0202802 0.0
uce-359_chunk_7.tre 0.0 0.0203467 0.00862068965517
uce-160_chunk_3.tre 0.0 0.0204413 0.0

In order of descending priority: tree length, bipartition support, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-815_chunk_8.tre 0.0 0.0202802 0.0
uce-359_chunk_7.tre 0.0 0.0203467 0.00862068965517
uce-160_chunk_3.tre 0.0 0.0204413 0.0

50-bp chunks: SH-like of 90% or above

(The first three steps were identical to those described above.)

Phase 1: get_var_length

python src/get_var_length.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks_90/ --flend .tre --outf Chunk90_analysis/var-chunks90 --outg alepisaurus_ferox,ceratoscopelus_warmingii

Phase 2: get_bp_genetrees

python src/get_bp_genetrees.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-114-taxa/Chunks_90/ /Users/David/Grive/Alfaro_Lab/Acanthomorpha/12_no_outgroups_scheme_3.tre --flend .tre --outf Chunk90_analysis/bp-chunks90

Phase 3: combine_results

python src/combine_results.py Chunk90_analysis/var-chunks90 Chunk90_analysis/bp-chunks90 --outf Chunk90_analysis/comb-chunks90

Delete the lines with NAs:

combchunks90 <- read.table("/Users/David/Grive/Alfaro_Lab/SortaDate/Chunk90_analysis/comb-chunks90", fill = TRUE)
filtered <- na.omit(combchunks90)
write.table(filtered,
      "/Users/David/Grive/Alfaro_Lab/SortaDate/comb-chunks90-filtered",
      sep = "\t",
      quote = FALSE,
      row.names = FALSE,
      col.names = FALSE)

The length of the filtered comb file is 536 lines, meaning that SortaDate failed to obtain root-to-tip variance and/or tree length for 534 (almost 50%) trees.

Step 1 of filtering: remove trees of zero length

No effect: the minimum observed tree length is non-zero and equal to:

## [1] 8.71304e-07

Phase 4: get_good_genes

In order of descending priority: bipartition support, root-to-tip variance, tree length

name root-to-tip_var treelength bipartition
uce-1317_chunk_8.tre 0.0166339 15.2225 0.0689655172414
uce-120_chunk_6.tre 0.187599 4.6936 0.0689655172414
uce-323_chunk_2.tre 0.325194 35.4206 0.0689655172414
## [1] "The Robinson-Foulds distances of the three best chunks from the reference tree are as follows:"
##                    chunk  rf
## 159 uce-1317_chunk_8.tre 111
##                  chunk  rf
## 81 uce-120_chunk_6.tre 114
##                   chunk  rf
## 248 uce-323_chunk_2.tre 107

The chunk with the minimum RF distance from the reference tree:

##                   chunk  rf
## 235 uce-267_chunk_7.tre 106

In order of descending priority: bipartition support, tree length, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-120_chunk_6.tre 0.187599 4.6936 0.0689655172414
uce-1317_chunk_8.tre 0.0166339 15.2225 0.0689655172414
uce-323_chunk_2.tre 0.325194 35.4206 0.0689655172414

In order of descending priority: root-to-tip variance, tree length, bipartition support

name root-to-tip_var treelength bipartition
uce-537_chunk_3.tre 0.0 8.71304e-07 0.00862068965517
uce-84_chunk_5.tre 0.0 9.84452e-07 0.0172413793103
uce-739_chunk_2.tre 0.0 2.2254e-06 0.0344827586207

In order of descending priority: root-to-tip variance, bipartition support, tree length

name root-to-tip_var treelength bipartition
uce-739_chunk_2.tre 0.0 2.2254e-06 0.0344827586207
uce-832_chunk_4.tre 0.0 1.05876 0.0258620689655
uce-84_chunk_5.tre 0.0 9.84452e-07 0.0172413793103

In order of descending priority: tree length, root-to-tip variance, bipartition support

name root-to-tip_var treelength bipartition
uce-537_chunk_3.tre 0.0 8.71304e-07 0.00862068965517
uce-84_chunk_5.tre 0.0 9.84452e-07 0.0172413793103
uce-739_chunk_2.tre 0.0 2.2254e-06 0.0344827586207

In order of descending priority: tree length, bipartition support, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-537_chunk_3.tre 0.0 8.71304e-07 0.00862068965517
uce-84_chunk_5.tre 0.0 9.84452e-07 0.0172413793103
uce-739_chunk_2.tre 0.0 2.2254e-06 0.0344827586207

50-bp chunks from the 75%-complete dataset

A bash script used to split the UCE loci into 50-bp chunks was identical to that used above, with the exception of step 3 (joining all taxa represented by a given chunk into a new FASTA file), for which the following code was used:

library(dplyr)

# Alternating rows (name, sequence, name, sequence) go to two different columns, so that
# each sequence is correctly assigned to its respective taxon:

split_seqs <- read.table("split.txt")
odd <- as.vector(split_seqs[seq(1, nrow(split_seqs), 2), ])
even <- as.vector(split_seqs[seq(2, nrow(split_seqs), 2), ])
odd_name <- "taxon"
even_name <- "sequence"
split_seqs_new <- data.frame(odd, even)
names(split_seqs_new) <- c(odd_name, even_name)

# Determine how long the locus is (i.e., how many 50-bp chunks it has been split into).
# This can be done by counting the number of occurrences of a single taxon name. 
# In principle, any name could be used, but since no taxon appears to be common to all
# the loci, the code below grabs the first taxon name appearing in the fasta file and
# counts its occurrences.

m <- paste(sapply(strsplit(as.character(split_seqs_new[1,1]), split="_", fixed=TRUE),
                  function(x) (x[1])), 
           "_", sapply(strsplit(as.character(split_seqs_new[1,1]), split="_", fixed=TRUE),                         function(x) (x[2])),
           sep = "")
n <- length(unique(grep(m, split_seqs_new[,1], value = TRUE)))

# Create a vector of strings that can filter taxon names according to the base pair range
# tag attached to their end

chunks <- vector()
for(i in 0:(n-1)) {
  chunks <- append(chunks, paste("_", i*50, "bp", sep = ""))
}

# Create a list of data frames. Each element of the list represents a base pair range
# and consists of a data frame containing both the "taxon" and "sequence" columns of
# split_seqs_new

partition <- list()
for(i in 1:length(chunks)) {
partition[[i]] <- data.frame(filter(split_seqs_new, 
                                    grepl(as.character(chunks[i]), taxon)))
}

# Create a matrix whose columns represents individual chunks (i.e., base pair ranges)
# and whose rows have the structure of the original split_seqs data frame -- i.e., name,
# sequence, name, sequence:

chunkmatrix <- matrix(ncol = length(partition), 
                      nrow = 2*(nrow(split_seqs_new)/length(partition)))
for(i in 1:length(partition)) {
  for(j in 1:nrow(partition[[i]])) {
    chunkmatrix[(2*j - 1), i] <- as.character(partition[[i]][j, "taxon"])
    chunkmatrix[2*j, i] <- as.character(partition[[i]][j, "sequence"])
  }
}

# Print the resulting fasta files!

for(i in 1:ncol(chunkmatrix)) {
  write(chunkmatrix[,i], paste("chunk_", i, ".fasta", sep = ""))
}

The taxon names in the chunk FASTA files were then stripped of the chunk-indicating suffixes so as to correspond with the names used in the reference tree, and a script was used to analyze all the chunks using RAxML. The resulting trees were then copied into the directory containing the chunks. Out of 6543 chunk FASTA files, only 4237 had tree files associated with them, suggesting that RAxML failed to infer a tree for a given chunk in approx. 35% of cases.

Phase 1: get_var_length

python src/get_var_length.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Chunks/ --flend .tre --outf Min-90-chunk_analysis/var-min-90-chunks --outg alepisaurus_ferox,ceratoscopelus_warmingii

Running the script produced 1029 warnings about either Alepisaurus ferox or Ceratoscopelus warmingii missing from the chunk tree. Despite this, the tree length and root-to-tip variance was computed for all the 4237 trees.

Phase 2: get_bp_genetrees

python src/get_bp_genetrees.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Chunks/ /Users/David/Grive/Alfaro_Lab/Acanthomorpha/12_no_outgroups_scheme_3.tre --flend .tre --outf Min-90-chunk_analysis/bp-min-90-chunks

Phase 3: combine_results

python src/combine_results.py Min-90-chunk_analysis/var-min-90-chunks Min-90-chunk_analysis/bp-min-90-chunks --outf Min-90-chunk_analysis/comb-min-90-chunks

Phase 4: get_good_genes

In order of descending priority: bipartition support, root-to-tip variance, tree length

name root-to-tip_var treelength bipartition
uce-157_chunk_7.tre 0.176994 14.918 0.258620689655
uce-1244_chunk_1.tre 0.0534134 11.1586 0.224137931034
uce-126_chunk_3.tre 0.104056 14.0824 0.224137931034
## [1] "The Robinson-Foulds distances of the three best chunks from the reference tree are as follows:"
##                    chunk  rf
## 1229 uce-157_chunk_7.tre 164
##                    chunk  rf
## 822 uce-1244_chunk_1.tre 158
##                   chunk  rf
## 907 uce-126_chunk_3.tre 166

The chunk with the minimum RF distance from the reference tree:

##                  chunk  rf
## 6 uce-1000_chunk_6.tre 136

In order of descending priority: bipartition support, tree length, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-157_chunk_7.tre 0.176994 14.918 0.258620689655
uce-1244_chunk_1.tre 0.0534134 11.1586 0.224137931034
uce-126_chunk_3.tre 0.104056 14.0824 0.224137931034

In order of descending priority: root-to-tip variance, tree length, bipartition support

name root-to-tip_var treelength bipartition
uce-956_chunk_5.tre 2.3002e-12 8.56621e-05 0.0
uce-393_chunk_3.tre 5.54758e-12 9.22463e-05 0.0
uce-921_chunk_4.tre 5.8998e-12 7.89216e-05 0.0

In order of descending priority: root-to-tip variance, bipartition support, tree length

name root-to-tip_var treelength bipartition
uce-956_chunk_5.tre 2.3002e-12 8.56621e-05 0.0
uce-393_chunk_3.tre 5.54758e-12 9.22463e-05 0.0
uce-921_chunk_4.tre 5.8998e-12 7.89216e-05 0.0

In order of descending priority: tree length, root-to-tip variance, bipartition support

name root-to-tip_var treelength bipartition
uce-921_chunk_4.tre 5.8998e-12 7.89216e-05 0.0
uce-1173_chunk_3.tre 1.96604e-11 8.52591e-05 0.0
uce-956_chunk_5.tre 2.3002e-12 8.56621e-05 0.0

In order of descending priority: tree length, bipartition support, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-921_chunk_4.tre 5.8998e-12 7.89216e-05 0.0
uce-1173_chunk_3.tre 1.96604e-11 8.52591e-05 0.0
uce-956_chunk_5.tre 2.3002e-12 8.56621e-05 0.0

Step 1 of filtering: remove trees of zero length

No effect: the minimum observed tree length is non-zero and equal to:

## [1] 7.89216e-05
## Number of trees after filtering: 4237
## Percentage of trees after filtering: 100%

Step 2 of filtering: remove the shortest and longest 10% of trees:

## Original min. tree length: 7.89216e-05
## Original max. tree length: 214.974
## Min. tree length after step 2: 0.285338
## Max. tree length after step 2: 24.1424
## Number of trees after step 2: 3389
## Percentage of trees after step 2: 79.98584%

Step 3 of filtering: remove the most unclocklike third of loci

## Max. root-to-tip variance from step 2: 2.72349
## Max. root-to-tip variance after step 3: 0.00905244
## Number of trees after step 3: 2270
## Percentage of trees after step 3: 53.57564%

Step 4 of filtering: keep the loci in the upper 10% of bipartition support

## Min. bipartition support from step 3: 0
## Min. bipartition support after step 4: 0.1034483
## Number of trees after step 4: 170
## Percentage of trees after step 4: 4.012273%

50-bp chunks from the 75%-complete dataset: SH-like of 75% or above

The nodes with SH-like support values below 75% were collapsed using the script above. The corresponding FASTA files were copied into the resulting directory, and SortaDate was run as follows:

Phase 1: get_var_length

python src/get_var_length.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Chunks_75/ --flend .tre --outf Min-90-chunk75_analysis/var-min-90-chunks75 --outg alepisaurus_ferox,ceratoscopelus_warmingii

Running the script resulted in occasional segmentation faults as well as “this really only works with nexus or newick” warning messages. The resulting file had the full number of lines (4237: one for each tree), but some of them were blank and others contained NAs.

Phase 2: get_bp_genetrees

python src/get_bp_genetrees.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Chunks_75/ /Users/David/Grive/Alfaro_Lab/Acanthomorpha/12_no_outgroups_scheme_3.tre --flend .tre --outf Min-90-chunk75_analysis/bp-min-90-chunks75

Phase 3: combine_results

python src/combine_results.py Min-90-chunk75_analysis/var-min-90-chunks75 Min-90-chunk75_analysis/bp-min-90-chunks75 --outf Min-90-chunk75_analysis/comb-min-90-chunks75

Delete the lines with NAs:

comb90chunks75 <- read.table("/Users/David/Grive/Alfaro_Lab/SortaDate/Min-90-chunk75_analysis/comb-min-90-chunks75", fill = TRUE)
filtered <- na.omit(comb90chunks75)
write.table(filtered,
      "/Users/David/Grive/Alfaro_Lab/SortaDate/Min-90-chunk75_analysis/comb-min-90-chunks75-filtered",
      sep = "\t",
      quote = FALSE,
      row.names = FALSE,
      col.names = FALSE)
nrow(filtered)

The length of the filtered comb file is 3589 lines, meaning that SortaDate failed to obtain root-to-tip variance and/or tree length for 648 trees.

Phase 4: get_good_genes

In order of descending priority: bipartition support, root-to-tip variance, tree length

name root-to-tip_var treelength bipartition
uce-157_chunk_7.tre 0.163848 13.508 0.198275862069
uce-1244_chunk_1.tre 0.040797 10.3002 0.181034482759
uce-126_chunk_3.tre 0.101031 13.9331 0.181034482759
## [1] "The Robinson-Foulds distances of the three best chunks from the reference tree are as follows:"
##                    chunk  rf
## 1064 uce-157_chunk_7.tre 126
##                    chunk  rf
## 710 uce-1244_chunk_1.tre 111
##                   chunk  rf
## 788 uce-126_chunk_3.tre 114

The chunk with the minimum RF distance from the reference tree:

##                    chunk rf
## 2016 uce-463_chunk_2.tre 85

In order of descending priority: bipartition support, tree length, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-157_chunk_7.tre 0.163848 13.508 0.198275862069
uce-1244_chunk_1.tre 0.040797 10.3002 0.181034482759
uce-126_chunk_3.tre 0.101031 13.9331 0.181034482759

In order of descending priority: root-to-tip variance, tree length, bipartition support

name root-to-tip_var treelength bipartition
uce-1175_chunk_4.tre 0.0 8.48388e-07 0.00862068965517
uce-547_chunk_2.tre 0.0 1.18678e-06 0.0172413793103
uce-815_chunk_8.tre 0.0 0.0202802 0.0

In order of descending priority: root-to-tip variance, bipartition support, tree length

name root-to-tip_var treelength bipartition
uce-67_chunk_4.tre 0.0 0.0214551 0.0603448275862
uce-200_chunk_2.tre 0.0 0.0317025 0.0344827586207
uce-525_chunk_3.tre 0.0 0.020706 0.0258620689655

In order of descending priority: tree length, root-to-tip variance, bipartition support

name root-to-tip_var treelength bipartition
uce-1175_chunk_4.tre 0.0 8.48388e-07 0.00862068965517
uce-547_chunk_2.tre 0.0 1.18678e-06 0.0172413793103
uce-815_chunk_8.tre 0.0 0.0202802 0.0

In order of descending priority: tree length, bipartition support, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-1175_chunk_4.tre 0.0 8.48388e-07 0.00862068965517
uce-547_chunk_2.tre 0.0 1.18678e-06 0.0172413793103
uce-815_chunk_8.tre 0.0 0.0202802 0.0

Step 1 of filtering: remove trees of zero length

No effect: the minimum observed tree length is non-zero and equal to:

## [1] 8.48388e-07
## Number of trees after filtering: 3589
## Percentage of trees after filtering: 100%

Step 2 of filtering: remove the shortest and longest 10% of trees:

## Original min. tree length: 8.48388e-07
## Original max. tree length: 209.395
## Min. tree length after step 2: 0.124261
## Max. tree length after step 2: 19.679
## Number of trees after step 2: 2871
## Percentage of trees after step 2: 79.99443%

Step 3 of filtering: remove the most unclocklike third of loci

## Max. root-to-tip variance from step 2: 6.66986
## Max. root-to-tip variance after step 3: 0.00645959
## Number of trees after step 3: 1923
## Percentage of trees after step 3: 53.58038%

Step 4 of filtering: keep the loci in the upper 10% of bipartition support

## Min. bipartition support from step 3: 0
## Min. bipartition support after step 4: 0.06034483
## Number of trees after step 4: 169
## Percentage of trees after step 4: 4.708833%

50-bp chunks from the 75%-complete dataset: SH-like of 90% or above

Phase 1: get_var_length

python src/get_var_length.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Chunks_90/ --flend .tre --outf Min-90-chunk90_analysis/var-min-90-chunks90 --outg alepisaurus_ferox,ceratoscopelus_warmingii

As in the previous case, the command led to a number of segfaults and “this really only works with nexus or newick” warnings, corresponding to lines in the resulting file that were either incomplete or included NAs.

Phase 2: get_bp_genetrees

python src/get_bp_genetrees.py /Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Chunks_90/ /Users/David/Grive/Alfaro_Lab/Acanthomorpha/12_no_outgroups_scheme_3.tre --flend .tre --outf Min-90-chunk90_analysis/bp-min-90-chunks90

Phase 3: combine_results

python src/combine_results.py Min-90-chunk90_analysis/var-min-90-chunks90 Min-90-chunk90_analysis/bp-min-90-chunks90 --outf Min-90-chunk90_analysis/comb-min-90-chunks90

Delete the lines with NAs:

comb90chunks90 <- read.table("/Users/David/Grive/Alfaro_Lab/SortaDate/Min-90-chunk90_analysis/comb-min-90-chunks90", fill = TRUE)
filtered <- na.omit(comb90chunks90)
write.table(filtered,
      "/Users/David/Grive/Alfaro_Lab/SortaDate/Min-90-chunk90_analysis/comb-min-90-chunks90-filtered",
      sep = "\t",
      quote = FALSE,
      row.names = FALSE,
      col.names = FALSE)
nrow(filtered)

The length of the filtered comb file is 2319 lines, meaning that SortaDate failed to obtain root-to-tip variance and/or tree length for 1918 trees.

Phase 4: get_good_genes

In order of descending priority: bipartition support, root-to-tip variance, tree length

name root-to-tip_var treelength bipartition
uce-625_chunk_10.tre 0.0675117 0.706842 0.0775862068966
uce-1317_chunk_8.tre 0.0166339 15.2225 0.0689655172414
uce-120_chunk_6.tre 0.187599 4.6936 0.0689655172414
## [1] "The Robinson-Foulds distances of the three best chunks from the reference tree are as follows:"
##                     chunk  rf
## 1686 uce-625_chunk_10.tre 106
##                    chunk  rf
## 629 uce-1317_chunk_8.tre 111
##                   chunk  rf
## 429 uce-120_chunk_6.tre 114

The chunk with the minimum RF distance from the reference tree:

##                    chunk rf
## 678 uce-1340_chunk_2.tre 85

In order of descending priority: bipartition support, tree length, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-625_chunk_10.tre 0.0675117 0.706842 0.0775862068966
uce-120_chunk_6.tre 0.187599 4.6936 0.0689655172414
uce-1317_chunk_8.tre 0.0166339 15.2225 0.0689655172414

In order of descending priority: root-to-tip variance, tree length, bipartition support

name root-to-tip_var treelength bipartition
uce-1175_chunk_4.tre 0.0 8.48388e-07 0.00862068965517
uce-675_chunk_5.tre 0.0 8.58914e-07 0.0
uce-537_chunk_3.tre 0.0 8.71304e-07 0.00862068965517

In order of descending priority: root-to-tip variance, bipartition support, tree length

name root-to-tip_var treelength bipartition
uce-1189_chunk_2.tre 0.0 1.52836 0.0431034482759
uce-739_chunk_2.tre 0.0 2.2254e-06 0.0344827586207
uce-67_chunk_4.tre 0.0 0.0214551 0.0344827586207

In order of descending priority: tree length, root-to-tip variance, bipartition support

name root-to-tip_var treelength bipartition
uce-1175_chunk_4.tre 0.0 8.48388e-07 0.00862068965517
uce-675_chunk_5.tre 0.0 8.58914e-07 0.0
uce-537_chunk_3.tre 0.0 8.71304e-07 0.00862068965517

In order of descending priority: tree length, bipartition support, root-to-tip variance

name root-to-tip_var treelength bipartition
uce-1175_chunk_4.tre 0.0 8.48388e-07 0.00862068965517
uce-675_chunk_5.tre 0.0 8.58914e-07 0.0
uce-537_chunk_3.tre 0.0 8.71304e-07 0.00862068965517

Step 1 of filtering: remove trees of zero length

No effect: the minimum observed tree length is non-zero and equal to:

## [1] 8.48388e-07
## Number of trees after filtering: 2319
## Percentage of trees after filtering: 100%

Step 2 of filtering: remove the shortest and longest 10% of trees:

## Original min. tree length: 8.48388e-07
## Original max. tree length: 141.225
## Min. tree length after step 2: 0.111172
## Max. tree length after step 2: 20.8823
## Number of trees after step 2: 1855
## Percentage of trees after step 2: 79.99138%

Step 3 of filtering: remove the most unclocklike third of loci

## Max. root-to-tip variance from step 2: 40.1753
## Max. root-to-tip variance after step 3: 0.00177898
## Number of trees after step 3: 1243
## Percentage of trees after step 3: 53.60069%

Step 4 of filtering: keep the loci in the upper 10% of bipartition support

## Min. bipartition support from step 3: 0
## Min. bipartition support after step 4: 0.03448276
## Number of trees after step 4: 66
## Percentage of trees after step 4: 2.846054%

Out of the 66 chunks selected from the SH-like > 0.9 dataset, 65 were 50-bp long. To facilitate PartitionFinder searches, these chunks were aligned first using SequenceMatrix v1.8, with the last (47-bp) chunk attached to the resulting alignment afterwards. This made it easier to automate the calculation of base pair ranges for each locus (the latter needed to be included in the PartitionFinder configuration file). The outgroups (Alepisaurus ferox and Ceratoscopelus warmingii) were excluded from the concatenated alignment, as they were not present in the available topology constraint.

PartitionFinder was first run with the “rcluster” search option. Although the “models” option was set to “all”, since rcluster searches can only be performed in RAxML, this effectively limited the analysis to the three models implemented in the latter software (GTR, GTR+\(\Gamma\), GTR+\(\Gamma\)+I). This run yielded 8 subsets of the following properties:

Subset Best Model Sites Rate under GTR+\(\Gamma\) ID
1 GTR+\(\Gamma\) 1047 1.048826 76dec7513c9b37738bfac30d5d512cf3
2 GTR+I+\(\Gamma\) 1100 0.672610 111ec939b8c23b0e4e2529cee50f387a
3 GTR+\(\Gamma\) 550 0.281570 00fea7ffef92b99d78df6650b56ebb0c
4 GTR+\(\Gamma\) 100 0.965800 065a20f09eb1fa7383dedb55df51e100
5 GTR+\(\Gamma\) 200 2.278670 a6afbf36fc27e68c089e09c31ca3c71f
6 GTR+\(\Gamma\) 50 1.682760 da54ec6f6966e6b696433ef124df9e1f
7 GTR+I+\(\Gamma\) 50 2.739838 2d0d981211d9b6e3f364c1fcd221b555
8 GTR+\(\Gamma\) 200 1.356237 fd9894d33a357dcdab10c55c5153eb16

In the next step, the alignment was analyzed under the “greedy” search option, with the BEAST model set and without the “–raxml” flag. This search recovered the following 14 partitions:

Subset Best Model Sites Tree size under the best model ID
1 HKY+\(\Gamma\)+X 100 6.07552 3f353939a600a6973a6a1e0608925da6
2 TRNEF+\(\Gamma\) 200 4.34326 ceefdb63d1de1122404ab11285f05f27
3 JC+\(\Gamma\) 50 2.80253 ac2a21040350ec54b4664bc752a13a45
4 TRNEF+\(\Gamma\) 300 1.53147 baed377a41689244a752bc9e3169c975
5 GTR+I+\(\Gamma\)+X 350 3.44224 ce05da9c4ae84b4a74cb468b18fcda82
6 K80+I+\(\Gamma\) 450 3.71596 16b9f71868c90950d0dd144fd5de511f
7 SYM+\(\Gamma\) 300 8.15389 c5076d62640bd9eeabe53de6e5bf7f7a
8 K80+\(\Gamma\) 150 9.20524 d5aeca1d1fad19fc89dbb336bad98cbd
9 K80+\(\Gamma\) 497 5.13898 e7f992b256534832ccdb46c09aa6e9a4
10 HKY+\(\Gamma\)+X 300 2.87180 4e722846245f8cd6600c32f3131120a6
11 TRNEF+\(\Gamma\) 250 5.87708 72ad9849e1f3be2cbffa1097935e00fc
12 GTR+I+\(\Gamma\)+X 50 36.99851 2d0d981211d9b6e3f364c1fcd221b555
13 JC+\(\Gamma\) 150 1.20339 d687ea29d8ab88ed27334eafc8164ec8
14 TRNEF+\(\Gamma\) 150 13.44953 00f862e3cabdf1be1b99bc5b1455ba78

(In PhyML, “tree size” denotes the sum of edge lengths: see http://github.com/stephaneguindon/phyml/blob/master/doc/phyml-manual.tex. “X” denotes estimated, as opposed to empirical, base frequencies.)

BEAST analysis

The two 50-bp partitions were removed. Each partition received its own substitution model, which was generally identical to that selected by PartitionFinder, except for those models that contained the I parameter (proportion of invariant sites). Since this parameter and \(\Gamma\) account for the same phenomenon (rate heterogeneity across sites), their simultaneous inclusion causes the resulting model to be non-identifiable, leading to potential mixing problems in MCMC simulations. The models that were not directly available in BEAUTi were implemented as follows:

  • TrNef: TN93 + equal base frequencies
  • K80: HKY85 + equal base frequencies
  • SYM: GTR + equal base frequencies

Each \(\Gamma\) rate heterogeneity distribution was discretized into 4 categories. The default improper prior on the relative rates (the allMus parameter) was set to a gamma distribution with a shape of 0.001 and a scale of 1000 following the recommendations at https://www.biostat.washington.edu/sites/default/files/modules//2016_SISMID_13_11.pdf. All the priors on substitution model parameters were kept at their default values.

In contrast to the substitution models, the parameters of the uncorrelated lognormal clock model were linked across partitions. The ucld.mean hyperparameter was assigned a lognormal hyperprior with a mean of 0.005 (in real space) and a log standard deviation of 1, with the initial value set equal to 0.005. A truncated exponential hyperprior with support (0, 1), a mean of 0.3, and an initial value of 0.1 was used for ucld.stdev.

The “fixed tree topology” operator mix was used (based on a user-supplied topology common to all partitions), with the tuning of the ucld.mean and ucld.stdev operators set to 0.9 (default value = 0.75) and their weight doubled (from 3.0 to 6.0). Default tuning values and weights were used for all the remaining operators.

The 12 internal calibrations were all implemented as exponentials, and the (Polymixia + Aphredoderus) received the corrected calibration whose 95th percentile was equal to 116.35 Ma. A truncated exponential distribution supported on (98 Ma, 143 Ma) was constructed to calibrate the root of the tree, with its mean equal to the midpoint of the support interval (22.5 without the offset).

Finally, the MCMC simulation was run for 500 million generations, with a sampling period of 1000 generations.

beast -threads 12 -beagle_SSE concatchunks.xml

The analysis finished after 12.597 days and the resulting log file was examined using LogAnalyser to determine the EES value of each parameter:

java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 50000000 -ess concatchunks.log
##                        statistic          median         ESS
## 1                      posterior -48625.69180000   3413.0600
## 2                          prior  -2049.74610000    844.2895
## 3                     likelihood -46575.83370000  32241.1992
## 4           treeModel.rootHeight    137.72570000   2238.3418
## 5              tmrca(Aipichthys)    120.61650000   1324.2513
## 6           tmrca(Berybolcensis)     55.65810000   6036.6497
## 7               tmrca(Calatomus)     23.59360000    718.9452
## 8          tmrca(Chaetodontidae)     37.91700000   6567.4848
## 9           tmrca(Eastmanalepes)     50.96060000  16986.7366
## 10            tmrca(Eobuglossus)     43.01970000   8226.7418
## 11            tmrca(Eucoelopoma)     56.77560000   5699.0608
## 12         tmrca(Homonotichthys)     98.71850000   1207.3403
## 13           tmrca(Mcconichthys)     66.97340000    688.1744
## 14                   tmrca(Mene)     59.19900000   4229.6624
## 15         tmrca(Ramphexocoetus)     53.71620000   3420.8121
## 16                   tmrca(Root)    137.72570000   2238.3418
## 17                 tmrca(Tarkus)     49.83070000  13887.1125
## 18     birthDeath.meanGrowthRate      0.01870000   3785.5560
## 19  birthDeath.relativeDeathRate      0.05030000 125120.0000
## 20            Subset1HKYGX.kappa      3.30330000 336740.0000
## 21     Subset1HKYGX.frequencies1      0.38220000 160000.0000
## 22     Subset1HKYGX.frequencies2      0.23020000 194740.0000
## 23     Subset1HKYGX.frequencies3      0.15200000 188220.0000
## 24     Subset1HKYGX.frequencies4      0.23390000 191870.0000
## 25            Subset1HKYGX.alpha      0.98970000 344070.0000
## 26          Subset2TRNEFG.kappa1      4.82390000 339050.0000
## 27          Subset2TRNEFG.kappa2      1.81510000 353590.0000
## 28           Subset2TRNEFG.alpha      0.67370000 343600.0000
## 29          Subset3TRNEFG.kappa1      3.05060000 330620.0000
## 30          Subset3TRNEFG.kappa2      4.86070000 327930.0000
## 31           Subset3TRNEFG.alpha      0.46710000 335870.0000
## 32              Subset4GTRIGX.ac      0.17860000 142440.0000
## 33              Subset4GTRIGX.ag      0.41800000  69343.9763
## 34              Subset4GTRIGX.at      0.10600000 158030.0000
## 35              Subset4GTRIGX.cg      0.34480000 106840.0000
## 36              Subset4GTRIGX.gt      0.14930000 132440.0000
## 37    Subset4GTRIGX.frequencies1      0.30180000 109180.0000
## 38    Subset4GTRIGX.frequencies2      0.21230000  87820.5730
## 39    Subset4GTRIGX.frequencies3      0.31520000  91691.2293
## 40    Subset4GTRIGX.frequencies4      0.16960000  98336.1071
## 41           Subset4GTRIGX.alpha      0.44590000 276290.0000
## 42            Subset5K80IG.kappa      3.56400000 364980.0000
## 43            Subset5K80IG.alpha      0.31900000 370210.0000
## 44                Subset6SYMG.ac      0.31450000 233040.0000
## 45                Subset6SYMG.ag      1.61400000 171440.0000
## 46                Subset6SYMG.at      0.24760000 241230.0000
## 47                Subset6SYMG.cg      1.23840000 197650.0000
## 48                Subset6SYMG.gt      0.24490000 285190.0000
## 49             Subset6SYMG.alpha      0.52120000 361230.0000
## 50             Subset7K80G.kappa      2.93730000 365130.0000
## 51             Subset7K80G.alpha      0.70830000 361470.0000
## 52             Subset8K80G.kappa      3.17480000 373600.0000
## 53             Subset8K80G.alpha      1.22040000 364000.0000
## 54            Subset9HKYGX.kappa      3.59770000 358280.0000
## 55     Subset9HKYGX.frequencies1      0.20910000 185070.0000
## 56     Subset9HKYGX.frequencies2      0.25900000 189260.0000
## 57     Subset9HKYGX.frequencies3      0.30580000 174970.0000
## 58     Subset9HKYGX.frequencies4      0.22510000 192860.0000
## 59            Subset9HKYGX.alpha      0.62780000 344070.0000
## 60         Subset10TRNEFG.kappa1      4.17550000 337400.0000
## 61         Subset10TRNEFG.kappa2      6.02560000 342470.0000
## 62          Subset10TRNEFG.alpha      0.50600000 338290.0000
## 63             Subset11JCG.alpha      0.38770000 282140.0000
## 64         Subset12TRNEFG.kappa1      3.08980000 324920.0000
## 65         Subset12TRNEFG.kappa2      7.49850000 321000.0000
## 66          Subset12TRNEFG.alpha      0.33910000 220060.0000
## 67               Subset1HKYGX.mu      1.17860000 183520.0000
## 68              Subset2TRNEFG.mu      0.86830000 155210.0000
## 69              Subset3TRNEFG.mu      0.28780000 112480.0000
## 70              Subset4GTRIGX.mu      0.71110000 105770.0000
## 71               Subset5K80IG.mu      0.81240000  89865.7064
## 72                Subset6SYMG.mu      1.65800000  86079.4311
## 73                Subset7K80G.mu      1.92930000 131010.0000
## 74                Subset8K80G.mu      1.05670000  92400.6821
## 75               Subset9HKYGX.mu      0.59180000 131650.0000
## 76             Subset10TRNEFG.mu      1.18620000 132450.0000
## 77                Subset11JCG.mu      0.20350000 110950.0000
## 78             Subset12TRNEFG.mu      2.54390000  79083.6503
## 79                     ucld.mean      0.00073594    332.5618
## 80                    ucld.stdev      0.98430000   2649.2935
## 81                      meanRate      0.00068979    700.5540
## 82        coefficientOfVariation      1.27000000    226.7723
## 83                    covariance      0.23290000    433.6862
## 84   Subset1HKYGX.treeLikelihood  -1890.61200000  36770.2075
## 85  Subset2TRNEFG.treeLikelihood  -2931.73400000  35236.5308
## 86  Subset3TRNEFG.treeLikelihood  -2228.42960000  36860.1607
## 87  Subset4GTRIGX.treeLikelihood  -4743.64810000  29583.0309
## 88   Subset5K80IG.treeLikelihood  -5280.81190000  32435.6649
## 89    Subset6SYMG.treeLikelihood  -5733.31740000  32350.1559
## 90    Subset7K80G.treeLikelihood  -3537.00090000  26073.9083
## 91    Subset8K80G.treeLikelihood  -8747.29310000  18666.6737
## 92   Subset9HKYGX.treeLikelihood  -3646.88110000  24260.8424
## 93 Subset10TRNEFG.treeLikelihood  -4255.05270000  39038.3017
## 94    Subset11JCG.treeLikelihood   -949.70740000   9002.7411
## 95 Subset12TRNEFG.treeLikelihood  -2630.61690000  25585.5111
## 96                   branchRates      0.00000000   1000.0000
## 97                    speciation   -580.10770000    566.5915

Since all the ESS values exceeded 200, the sample from the posterior was summarized using TreeAnnotator:

java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 50000000 -heights median concatchunks.trees chunks-BEAST.tre
Comparison with the MCMCTree-generated tree from the manuscript

Comparison with the MCMCTree-generated tree from the manuscript

Five more replicates of the analysis above were run. These finished after 7.687, 7.755, 8.275, 8.365, and 8.386 days, respectively. Each was summarized using TreeAnnotator, after passing the corresponding log file to LogAnalyser to ensure that no parameter had an effective sample size smaller than 200:

/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 50000000 -heights median concatchunks.trees repl5-BEAST-uncorr-lognorm-12-calib.tre

While the LogAnalyser results are based on the full log files (comprising 500,000 samples), the maximum clade credibility trees were constructed from .trees file that were resampled at a frequency of 1 sample per 25,000 generations to limit the file size. Similarly, after the LogAnalyser examination, the log files were resampled with the same frequency in order to be downloaded and saved to the local machine.

/home/linuxbrew/.linuxbrew/bin/logcombiner -trees -resample 25000 concatchunks.trees repl1-concatchunks-resampled.trees
/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 50000000 -heights median repl3-concatchunks-resampled.trees repl3-BEAST-uncorr-lognorm-12-calib.tre
/home/linuxbrew/.linuxbrew/bin/logcombiner -resample 25000 concatchunks.log repl1-concatchunks-resampled.log

BEAST: Random local clocks

Two more analyses were performed using BEAST; their settings were identical to those described above except for the clock model used. In both cases, a random local clock model was utilized, with one analysis using the uncorrelated version of the model (with the ratesAreMultipliers option in the XML file set to the default value of "false") and the other employing the correlated version (ratesAreMultipliers="true"). In both analyses, default priors were placed on the parameters of the clock model.

beast -threads 6 -beagle_SSE concatchunks-urlc.xml
beast -threads 6 -beagle_SSE concatchunks-crlc.xml

The correlated analysis finished up after 29.926 days, with the following ESS values calculated using LogAnalyser for individual parameters:

java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 50000000 -ess concatchunks-crlc.log
##                        statistic          median         ESS
## 1                      posterior -47786.23460000   1202.3104
## 2                          prior  -1129.67560000    717.9970
## 3                     likelihood -46656.39340000    945.4421
## 4           treeModel.rootHeight    135.05800000    414.7867
## 5              tmrca(Aipichthys)    126.00640000    494.8186
## 6           tmrca(Berybolcensis)     53.51960000  12275.7790
## 7               tmrca(Calatomus)     25.96070000   1669.9858
## 8          tmrca(Chaetodontidae)     58.85270000   1152.7407
## 9           tmrca(Eastmanalepes)     50.97210000   3342.9559
## 10            tmrca(Eobuglossus)     42.50760000   6213.4747
## 11            tmrca(Eucoelopoma)     58.26010000   1811.6704
## 12         tmrca(Homonotichthys)    108.43010000    717.7257
## 13           tmrca(Mcconichthys)     66.02220000   1796.0528
## 14                   tmrca(Mene)     67.44310000    281.0520
## 15         tmrca(Ramphexocoetus)     55.51120000    433.6284
## 16                   tmrca(Root)    135.05800000    414.7867
## 17                 tmrca(Tarkus)     49.61740000   4273.3358
## 18     birthDeath.meanGrowthRate      0.01700000   2459.6700
## 19  birthDeath.relativeDeathRate      0.05110000 287740.0000
## 20            Subset1HKYGX.kappa      3.30800000 334490.0000
## 21     Subset1HKYGX.frequencies1      0.38240000 160420.0000
## 22     Subset1HKYGX.frequencies2      0.23030000 192550.0000
## 23     Subset1HKYGX.frequencies3      0.15190000 182740.0000
## 24     Subset1HKYGX.frequencies4      0.23360000 192820.0000
## 25            Subset1HKYGX.alpha      0.98960000 337920.0000
## 26          Subset2TRNEFG.kappa1      4.82420000 342630.0000
## 27          Subset2TRNEFG.kappa2      1.80750000 347460.0000
## 28           Subset2TRNEFG.alpha      0.67210000 345030.0000
## 29          Subset3TRNEFG.kappa1      3.04650000 327610.0000
## 30          Subset3TRNEFG.kappa2      4.85740000 326980.0000
## 31           Subset3TRNEFG.alpha      0.46690000 334550.0000
## 32              Subset4GTRIGX.ac      0.17990000 142240.0000
## 33              Subset4GTRIGX.ag      0.42000000  68312.1561
## 34              Subset4GTRIGX.at      0.10480000 157680.0000
## 35              Subset4GTRIGX.cg      0.34860000 103710.0000
## 36              Subset4GTRIGX.gt      0.14910000 131850.0000
## 37    Subset4GTRIGX.frequencies1      0.30190000 106050.0000
## 38    Subset4GTRIGX.frequencies2      0.21160000  87216.1881
## 39    Subset4GTRIGX.frequencies3      0.31440000  90157.5159
## 40    Subset4GTRIGX.frequencies4      0.17080000  92338.2633
## 41           Subset4GTRIGX.alpha      0.44550000 266720.0000
## 42            Subset5K80IG.kappa      3.56330000 360360.0000
## 43            Subset5K80IG.alpha      0.31890000 368450.0000
## 44                Subset6SYMG.ac      0.31440000 234590.0000
## 45                Subset6SYMG.ag      1.60770000 179060.0000
## 46                Subset6SYMG.at      0.24660000 242470.0000
## 47                Subset6SYMG.cg      1.23110000 210510.0000
## 48                Subset6SYMG.gt      0.24460000 280930.0000
## 49             Subset6SYMG.alpha      0.52200000 360210.0000
## 50             Subset7K80G.kappa      2.93690000 360870.0000
## 51             Subset7K80G.alpha      0.70900000 347840.0000
## 52             Subset8K80G.kappa      3.17650000 370730.0000
## 53             Subset8K80G.alpha      1.22120000 373330.0000
## 54            Subset9HKYGX.kappa      3.59400000 350940.0000
## 55     Subset9HKYGX.frequencies1      0.20940000 190900.0000
## 56     Subset9HKYGX.frequencies2      0.25890000 189780.0000
## 57     Subset9HKYGX.frequencies3      0.30580000 174610.0000
## 58     Subset9HKYGX.frequencies4      0.22510000 192130.0000
## 59            Subset9HKYGX.alpha      0.62730000 343320.0000
## 60         Subset10TRNEFG.kappa1      4.17600000 340970.0000
## 61         Subset10TRNEFG.kappa2      6.01240000 341760.0000
## 62          Subset10TRNEFG.alpha      0.50610000 329970.0000
## 63             Subset11JCG.alpha      0.38680000 274790.0000
## 64         Subset12TRNEFG.kappa1      3.09240000 325270.0000
## 65         Subset12TRNEFG.kappa2      7.48350000 306510.0000
## 66          Subset12TRNEFG.alpha      0.33950000 208810.0000
## 67               Subset1HKYGX.mu      1.18540000 186190.0000
## 68              Subset2TRNEFG.mu      0.87230000 157190.0000
## 69              Subset3TRNEFG.mu      0.28860000 113310.0000
## 70              Subset4GTRIGX.mu      0.71450000 101590.0000
## 71               Subset5K80IG.mu      0.80740000  91318.3057
## 72                Subset6SYMG.mu      1.64800000  85955.1006
## 73                Subset7K80G.mu      1.92330000 132780.0000
## 74                Subset8K80G.mu      1.06010000  97761.3964
## 75               Subset9HKYGX.mu      0.59170000 131630.0000
## 76             Subset10TRNEFG.mu      1.18630000 129470.0000
## 77                Subset11JCG.mu      0.20340000 109330.0000
## 78             Subset12TRNEFG.mu      2.55320000  72796.4880
## 79                    clock.rate      0.00063021    436.3276
## 80               rateChangeCount     39.00000000    187.2791
## 81                      meanRate      0.00063021    436.3276
## 82        coefficientOfVariation      1.37330000    374.7456
## 83                    covariance      0.70240000    956.0224
## 84   Subset1HKYGX.treeLikelihood  -1889.53140000    723.3662
## 85  Subset2TRNEFG.treeLikelihood  -2945.46670000   1356.2685
## 86  Subset3TRNEFG.treeLikelihood  -2230.45120000    621.7841
## 87  Subset4GTRIGX.treeLikelihood  -4752.07410000   1060.1094
## 88   Subset5K80IG.treeLikelihood  -5294.63300000    483.4810
## 89    Subset6SYMG.treeLikelihood  -5738.18580000    663.2128
## 90    Subset7K80G.treeLikelihood  -3548.57130000    668.2998
## 91    Subset8K80G.treeLikelihood  -8752.68650000    894.9200
## 92   Subset9HKYGX.treeLikelihood  -3654.65200000   1209.9001
## 93 Subset10TRNEFG.treeLikelihood  -4263.94590000   1428.0418
## 94    Subset11JCG.treeLikelihood   -948.69150000   2342.0209
## 95 Subset12TRNEFG.treeLikelihood  -2636.67400000    751.9278
## 96                   branchRates      0.00000000   1000.0000
## 97                    speciation   -591.13650000    334.1591

The effective sample sizes for all parameters except rateChangeCount exceeded 200. Despite this, the maximum clade credibility tree was constructed from a resampled .trees file:

java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -trees -resample 25000 concatchunks-crlc.trees concatchunks-crlc-resampled.trees
java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 50000000 -heights median concatchunks-crlc-resampled.trees concatchunks-crlc.tre

The uncorrelated analysis reached the target of 500 million generations after 34.955 days and was subject to a LogAnalyser examination in order to compute the effective sample sizes of the estimated parameters:

As in the correlated analysis summarized above, only a single parameter had an ESS value of less than 200 (tmrca(Homonotichthys)). This result was regarded as acceptable, and steps were therefore taken to summarize the posterior. The .trees file was resampled at a frequency of 1 saved tree per 25,000 generations in order to speed up the search for the maximum credibility tree shown in the comparison below.

BEAST: Extended calibration set

Another analysis was started with settings identical to the first one (in particular, with the uncorrelated lognormal relaxed clock model), but with the extended dataset of 28 calibrations:

## [1] "Aipichthys = 15.02136903129"
## [1] "Archaeotetraodon = 8.83924115441245"
## [1] "Archaeus = 8.88931238451675"
## [1] "Berybolcensis = 14.9212265710814"
## [1] "Calatomus = 17.1577415157402"
## [1] "Chaetodontidae = 12.1439423412963"
## [1] "Cretzeus = 13.8830830669189"
## [1] "Eastmanalepes = 6.77630647411528"
## [1] "Eobothus = 10.2479117613468"
## [1] "Eobuglossus = 8.47872829766149"
## [1] "Eocoelopoma = 13.3623422738342"
## [1] "Gasterorhamphosus = 13.8830830669189"
## [1] "Homonotichthys = 7.49399410561025"
## [1] "Hoplopteryx = 11.3828596437109"
## [1] "Luvarus = 8.88931238451675"
## [1] "Mahengechromis = 7.32375192325563"
## [1] "Massamorichthys = 17.0405748372961"
## [1] "Mene = 13.1520431073962"
## [1] "Proacanthurus = 6.77630647411528"
## [1] "Prosolenostomus = 14.9212265710814"
## [1] "Rhamphexocoetus = 10.2479117613468"
## [1] "Rhinocephalus = 13.5859937683001"
## [1] "Sphyraena = 10.2479117613468"
## [1] "Tarkus = 4.20598332876121"
## [1] "Trachipterus = 29.5987731556553"
## [1] "Triodon = 6.37573663328088"
## [1] "Turkmene = 22.8091143535122"
## [1] "Zenopsis = 19.8215309572889"
/home/linuxbrew/.linuxbrew/bin/beast -beagle_SSE -beagle_scaling dynamic concatchunks-ext.xml

This run finished after 7.059 days, with the following effective sample sizes for individual parameters (as determined by LogAnalyser):

/home/linuxbrew/.linuxbrew/bin/loganalyser -burnin 50000000 -ess concatchunks-ext.log
##                         statistic          median        ESS
## 1                       posterior -48684.38740000  2763.4231
## 2                           prior  -2109.18830000   721.1500
## 3                      likelihood -46575.16520000 13844.4609
## 4            treeModel.rootHeight    136.14570000  1743.5335
## 5               tmrca(Aipichthys)    122.76580000  1222.0035
## 6         tmrca(Archaeotetraodon)     36.63320000   198.8701
## 7                 tmrca(Archaeus)     56.60770000  8352.3874
## 8            tmrca(Berybolcensis)     54.67900000  7254.2808
## 9                tmrca(Calatomus)     26.00620000   399.8151
## 10          tmrca(Chaetodontidae)     41.24440000  6296.0075
## 11                tmrca(Cretzeus)     94.24210000   135.2974
## 12           tmrca(Eastmanalepes)     52.64200000 14885.0059
## 13                tmrca(Eobothus)     57.80940000  1785.4873
## 14             tmrca(Eobuglossus)     43.29610000  4816.3060
## 15             tmrca(Eocoelopoma)     56.88220000  5205.1277
## 16       tmrca(Gasterorhamphosus)     75.07320000  2132.2268
## 17          tmrca(Homonotichthys)     99.30460000  1329.7146
## 18             tmrca(Hoplopteryx)    127.44200000  1086.9194
## 19                 tmrca(Luvarus)     65.22740000   809.0498
## 20          tmrca(Mahengechromis)     48.20490000  7659.4325
## 21         tmrca(Massamorichthys)     63.21960000   600.5667
## 22                    tmrca(Mene)     58.76210000  4622.4281
## 23           tmrca(Proacanthurus)     51.23210000  6784.3879
## 24         tmrca(Prosolenostomus)     59.10850000   821.7280
## 25         tmrca(Rhamphexocoetus)     54.45010000  6244.0991
## 26           tmrca(Rhinocephalus)     56.57650000  2295.8754
## 27                    tmrca(Root)    136.14570000  1743.5335
## 28               tmrca(Sphyraena)     59.69350000  8346.0325
## 29                  tmrca(Tarkus)     49.83670000  9320.9490
## 30            tmrca(Trachipterus)     25.06320000   870.6285
## 31                 tmrca(Triodon)     53.63450000   612.9674
## 32                tmrca(Turkmene)     64.00440000   855.3938
## 33                tmrca(Zenopsis)     37.80840000   420.6294
## 34      birthDeath.meanGrowthRate      0.01850000  3892.3232
## 35   birthDeath.relativeDeathRate      0.04550000 15134.3519
## 36             Subset1HKYGX.kappa      3.30200000 18001.0000
## 37      Subset1HKYGX.frequencies1      0.38230000 18001.0000
## 38      Subset1HKYGX.frequencies2      0.23000000 18001.0000
## 39      Subset1HKYGX.frequencies3      0.15230000 18001.0000
## 40      Subset1HKYGX.frequencies4      0.23400000 18001.0000
## 41             Subset1HKYGX.alpha      0.98980000 17491.3422
## 42           Subset2TRNEFG.kappa1      4.81920000 18001.0000
## 43           Subset2TRNEFG.kappa2      1.81260000 17951.5243
## 44            Subset2TRNEFG.alpha      0.67220000 17988.8690
## 45           Subset3TRNEFG.kappa1      3.04990000 17514.5608
## 46           Subset3TRNEFG.kappa2      4.84920000 18001.0000
## 47            Subset3TRNEFG.alpha      0.46630000 17907.2312
## 48               Subset4GTRIGX.ac      0.17760000 18001.0000
## 49               Subset4GTRIGX.ag      0.41520000 17530.1025
## 50               Subset4GTRIGX.at      0.10570000 18001.0000
## 51               Subset4GTRIGX.cg      0.34220000 17165.8646
## 52               Subset4GTRIGX.gt      0.14880000 17568.1246
## 53     Subset4GTRIGX.frequencies1      0.30190000 17366.2591
## 54     Subset4GTRIGX.frequencies2      0.21200000 17969.0065
## 55     Subset4GTRIGX.frequencies3      0.31540000 17187.6766
## 56     Subset4GTRIGX.frequencies4      0.16930000 17966.0438
## 57            Subset4GTRIGX.alpha      0.44540000 18001.0000
## 58             Subset5K80IG.kappa      3.56170000 18001.0000
## 59             Subset5K80IG.alpha      0.31900000 18001.0000
## 60                 Subset6SYMG.ac      0.31350000 16789.4178
## 61                 Subset6SYMG.ag      1.60860000 18001.0000
## 62                 Subset6SYMG.at      0.24720000 17544.8122
## 63                 Subset6SYMG.cg      1.23730000 17490.5724
## 64                 Subset6SYMG.gt      0.24400000 18001.0000
## 65              Subset6SYMG.alpha      0.51970000 18001.0000
## 66              Subset7K80G.kappa      2.93720000 18001.0000
## 67              Subset7K80G.alpha      0.70890000 18001.0000
## 68              Subset8K80G.kappa      3.17600000 17757.1075
## 69              Subset8K80G.alpha      1.21870000 17392.0732
## 70             Subset9HKYGX.kappa      3.59600000 17229.8960
## 71      Subset9HKYGX.frequencies1      0.20940000 17411.0904
## 72      Subset9HKYGX.frequencies2      0.25880000 18001.0000
## 73      Subset9HKYGX.frequencies3      0.30590000 17833.6630
## 74      Subset9HKYGX.frequencies4      0.22510000 17823.9222
## 75             Subset9HKYGX.alpha      0.62670000 17827.3347
## 76          Subset10TRNEFG.kappa1      4.17290000 17254.1945
## 77          Subset10TRNEFG.kappa2      6.01360000 18001.0000
## 78           Subset10TRNEFG.alpha      0.50490000 17516.4006
## 79              Subset11JCG.alpha      0.38730000 18001.0000
## 80          Subset12TRNEFG.kappa1      3.08330000 17424.3666
## 81          Subset12TRNEFG.kappa2      7.49460000 18001.0000
## 82           Subset12TRNEFG.alpha      0.33860000 17339.1346
## 83                Subset1HKYGX.mu      1.18330000 17737.5069
## 84               Subset2TRNEFG.mu      0.86810000 17265.7801
## 85               Subset3TRNEFG.mu      0.28770000 18001.0000
## 86               Subset4GTRIGX.mu      0.71030000 17282.8179
## 87                Subset5K80IG.mu      0.81110000 17571.3579
## 88                 Subset6SYMG.mu      1.65750000 18001.0000
## 89                 Subset7K80G.mu      1.93380000 18001.0000
## 90                 Subset8K80G.mu      1.05690000 16988.5077
## 91                Subset9HKYGX.mu      0.59210000 18001.0000
## 92              Subset10TRNEFG.mu      1.18530000 17556.4438
## 93                 Subset11JCG.mu      0.20370000 18001.0000
## 94              Subset12TRNEFG.mu      2.55030000 16378.8994
## 95                      ucld.mean      0.00072704   197.9636
## 96                     ucld.stdev      0.98360000   866.6601
## 97                       meanRate      0.00068166   771.9061
## 98         coefficientOfVariation      1.29570000   143.9961
## 99                     covariance      0.23880000   208.4930
## 100   Subset1HKYGX.treeLikelihood  -1890.75370000 15350.8909
## 101  Subset2TRNEFG.treeLikelihood  -2931.99210000 15300.1650
## 102  Subset3TRNEFG.treeLikelihood  -2228.20130000 15363.0960
## 103  Subset4GTRIGX.treeLikelihood  -4743.39680000 12599.6925
## 104   Subset5K80IG.treeLikelihood  -5281.05330000 16367.2093
## 105    Subset6SYMG.treeLikelihood  -5733.17950000 13937.6508
## 106    Subset7K80G.treeLikelihood  -3536.68580000 11820.6143
## 107    Subset8K80G.treeLikelihood  -8747.09380000 13462.9728
## 108   Subset9HKYGX.treeLikelihood  -3646.62540000 13764.4572
## 109 Subset10TRNEFG.treeLikelihood  -4254.89780000 12708.1486
## 110    Subset11JCG.treeLikelihood   -949.69550000  8979.5379
## 111 Subset12TRNEFG.treeLikelihood  -2630.68180000 15615.8354
## 112                    speciation   -581.46880000   546.6950

Since four parameters had ESS values of less than 200 (tmrca(Archaeotetraodon), tmrca(Cretzeus), ucld.mean, coefficientOfVariation), a new run was started with identical settings and the chain length set to 250 million generations:

/home/linuxbrew/.linuxbrew/bin/beast -beagle_SSE -beagle_scaling dynamic concatchunks-ext2.xml

The second run finished after 4.384 days.

The next steps were as follows:

  1. Remove the first 10% of samples as burnin from both chains to produce truncated log files:

    /home/linuxbrew/.linuxbrew/bin/logcombiner -burnin 50000001 concatchunks-ext.log concatchunks-ext-no-burnin.log
    /home/linuxbrew/.linuxbrew/bin/logcombiner -burnin 25000001 concatchunks-ext2.log concatchunks-ext2-no-burnin.log
  2. Combine the truncated log files:

    /home/linuxbrew/.linuxbrew/bin/logcombiner -burnin 0 concatchunks-ext-no-burnin.log concatchunks-ext2-no-burnin.log concatchunks-ext-combined.log
  3. Analyze the combined log file to assess ESS values:

    /home/linuxbrew/.linuxbrew/bin/loganalyser -burnin 0 -ess concatchunks-ext-combined.log

Although the ESS values for the 4 previously problematic parameters all exceeded 200 when the two runs were combined, the ESS value of one more parameter (tmrca(Tarkus)) unexpectedly dropped below the threshold:

##                         statistic          median        ESS
## 1                       posterior -48684.61890000  4018.7952
## 2                           prior  -2109.34860000  1006.3031
## 3                      likelihood -46575.17930000 21777.6112
## 4            treeModel.rootHeight    136.07150000  2642.6995
## 5               tmrca(Aipichthys)    122.59890000  2219.8782
## 6         tmrca(Archaeotetraodon)     36.68160000   300.9514
## 7                 tmrca(Archaeus)     56.60900000 12713.2916
## 8            tmrca(Berybolcensis)     54.64290000 11218.4625
## 9                tmrca(Calatomus)     25.72650000   613.1224
## 10          tmrca(Chaetodontidae)     41.25430000  8190.6974
## 11                tmrca(Cretzeus)     93.90460000   237.4754
## 12           tmrca(Eastmanalepes)     52.64260000 22170.2330
## 13                tmrca(Eobothus)     57.91220000  2680.7794
## 14             tmrca(Eobuglossus)     43.31750000  7470.8790
## 15             tmrca(Eocoelopoma)     56.87570000  8276.2164
## 16       tmrca(Gasterorhamphosus)     75.05650000  3230.4774
## 17          tmrca(Homonotichthys)     99.22490000  2555.9326
## 18             tmrca(Hoplopteryx)    127.20650000  1478.2580
## 19                 tmrca(Luvarus)     65.31620000  1396.4432
## 20          tmrca(Mahengechromis)     48.17900000 11840.4841
## 21         tmrca(Massamorichthys)     63.33570000  1009.8491
## 22                    tmrca(Mene)     58.82670000  6847.9146
## 23           tmrca(Proacanthurus)     51.25330000 10184.5107
## 24         tmrca(Prosolenostomus)     59.06560000  1226.1886
## 25         tmrca(Rhamphexocoetus)     54.36880000  8583.6205
## 26           tmrca(Rhinocephalus)     56.60650000  3439.4406
## 27                    tmrca(Root)    136.07150000  2642.6995
## 28               tmrca(Sphyraena)     59.76390000 12653.9557
## 29                  tmrca(Tarkus)     49.98980000   126.1217
## 30            tmrca(Trachipterus)     25.29110000  1336.7531
## 31                 tmrca(Triodon)     53.71510000   937.0322
## 32                tmrca(Turkmene)     64.05950000  1416.0132
## 33                tmrca(Zenopsis)     37.73790000   802.1615
## 34      birthDeath.meanGrowthRate      0.01850000  5668.8584
## 35   birthDeath.relativeDeathRate      0.04530000 23256.0832
## 36             Subset1HKYGX.kappa      3.30360000 26678.2412
## 37      Subset1HKYGX.frequencies1      0.38220000 27000.0000
## 38      Subset1HKYGX.frequencies2      0.23020000 27000.0000
## 39      Subset1HKYGX.frequencies3      0.15220000 27000.0000
## 40      Subset1HKYGX.frequencies4      0.23400000 27000.0000
## 41             Subset1HKYGX.alpha      0.98870000 26584.5037
## 42           Subset2TRNEFG.kappa1      4.82280000 27000.0000
## 43           Subset2TRNEFG.kappa2      1.81350000 26752.9895
## 44            Subset2TRNEFG.alpha      0.67290000 26917.0274
## 45           Subset3TRNEFG.kappa1      3.05000000 27000.0000
## 46           Subset3TRNEFG.kappa2      4.84750000 27000.0000
## 47            Subset3TRNEFG.alpha      0.46620000 27000.0000
## 48               Subset4GTRIGX.ac      0.17750000 27000.0000
## 49               Subset4GTRIGX.ag      0.41500000 25570.6224
## 50               Subset4GTRIGX.at      0.10570000 27000.0000
## 51               Subset4GTRIGX.cg      0.34230000 26434.8037
## 52               Subset4GTRIGX.gt      0.14860000 26976.0510
## 53     Subset4GTRIGX.frequencies1      0.30190000 26472.4252
## 54     Subset4GTRIGX.frequencies2      0.21200000 26641.8780
## 55     Subset4GTRIGX.frequencies3      0.31550000 26599.0962
## 56     Subset4GTRIGX.frequencies4      0.16930000 26366.7657
## 57            Subset4GTRIGX.alpha      0.44520000 27000.0000
## 58             Subset5K80IG.kappa      3.55900000 27000.0000
## 59             Subset5K80IG.alpha      0.31910000 26404.5070
## 60                 Subset6SYMG.ac      0.31370000 25467.5781
## 61                 Subset6SYMG.ag      1.60920000 27000.0000
## 62                 Subset6SYMG.at      0.24730000 27000.0000
## 63                 Subset6SYMG.cg      1.23790000 26676.0355
## 64                 Subset6SYMG.gt      0.24420000 27000.0000
## 65              Subset6SYMG.alpha      0.52000000 26757.9001
## 66              Subset7K80G.kappa      2.93550000 27000.0000
## 67              Subset7K80G.alpha      0.70880000 27000.0000
## 68              Subset8K80G.kappa      3.17590000 27000.0000
## 69              Subset8K80G.alpha      1.22000000 26734.3603
## 70             Subset9HKYGX.kappa      3.59430000 26262.9651
## 71      Subset9HKYGX.frequencies1      0.20920000 26249.4586
## 72      Subset9HKYGX.frequencies2      0.25890000 26904.3496
## 73      Subset9HKYGX.frequencies3      0.30580000 26158.2443
## 74      Subset9HKYGX.frequencies4      0.22510000 26447.5814
## 75             Subset9HKYGX.alpha      0.62670000 26428.7697
## 76          Subset10TRNEFG.kappa1      4.17090000 26416.2617
## 77          Subset10TRNEFG.kappa2      6.01590000 27000.0000
## 78           Subset10TRNEFG.alpha      0.50510000 26076.5998
## 79              Subset11JCG.alpha      0.38700000 27000.0000
## 80          Subset12TRNEFG.kappa1      3.08690000 26675.3958
## 81          Subset12TRNEFG.kappa2      7.50120000 27000.0000
## 82           Subset12TRNEFG.alpha      0.33850000 27000.0000
## 83                Subset1HKYGX.mu      1.18280000 26828.6938
## 84               Subset2TRNEFG.mu      0.86800000 26255.6956
## 85               Subset3TRNEFG.mu      0.28790000 27000.0000
## 86               Subset4GTRIGX.mu      0.71010000 25129.2484
## 87                Subset5K80IG.mu      0.81080000 26461.5267
## 88                 Subset6SYMG.mu      1.65730000 27000.0000
## 89                 Subset7K80G.mu      1.93370000 26130.1157
## 90                 Subset8K80G.mu      1.05650000 27000.0000
## 91                Subset9HKYGX.mu      0.59240000 25762.3514
## 92              Subset10TRNEFG.mu      1.18560000 26783.9906
## 93                 Subset11JCG.mu      0.20370000 27000.0000
## 94              Subset12TRNEFG.mu      2.55020000 24805.2309
## 95                      ucld.mean      0.00072680   294.7377
## 96                     ucld.stdev      0.98350000  1554.3464
## 97                       meanRate      0.00068161   995.7414
## 98         coefficientOfVariation      1.29670000   231.1065
## 99                     covariance      0.24070000   453.1176
## 100   Subset1HKYGX.treeLikelihood  -1890.75160000 22436.7525
## 101  Subset2TRNEFG.treeLikelihood  -2932.00810000 23328.1981
## 102  Subset3TRNEFG.treeLikelihood  -2228.22220000 23305.2617
## 103  Subset4GTRIGX.treeLikelihood  -4743.39800000 18857.7749
## 104   Subset5K80IG.treeLikelihood  -5281.08270000 22518.3527
## 105    Subset6SYMG.treeLikelihood  -5733.24140000 21927.9543
## 106    Subset7K80G.treeLikelihood  -3536.71940000 17624.6335
## 107    Subset8K80G.treeLikelihood  -8746.96880000 16840.0116
## 108   Subset9HKYGX.treeLikelihood  -3646.70940000 21631.2317
## 109 Subset10TRNEFG.treeLikelihood  -4254.92840000 19433.2103
## 110    Subset11JCG.treeLikelihood   -949.68370000 14102.6829
## 111 Subset12TRNEFG.treeLikelihood  -2630.64210000 19591.9161
## 112                   branchRates      0.00000000 25000.0000
## 113                    speciation   -581.49490000   733.9149

To assess whether this reflects a more general problem with convergence, maximum clade credibility trees were constructed for both the first run alone (18,000 samples excluding burnin) and the combined run (27,000 samples excluding burnin) and compared to each other:

/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 50000000 -heights median concatchunks-ext.trees chunks-ext-run1.tre
/home/linuxbrew/.linuxbrew/bin/logcombiner -trees -burnin 50000001 concatchunks-ext.trees concatchunks-ext-no-burnin.trees
/home/linuxbrew/.linuxbrew/bin/logcombiner -trees -burnin 25000001 concatchunks-ext2.trees concatchunks-ext2-no-burnin.trees
/home/linuxbrew/.linuxbrew/bin/logcombiner -trees -burnin 0 concatchunks-ext-no-burnin.trees concatchunks-ext2-no-burnin.trees concatchunks-ext-combined.trees
/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 0 -heights median concatchunks-ext-combined.trees chunks-ext-combined.tre

As can be seen, the ages are nearly identical:

BEAST under the extended calibration set: first run only vs. combined run

BEAST under the extended calibration set: first run only vs. combined run

Comparison of the BEAST analyses under the uncorrelated lognormal clock and 12/28 calibrations with the manuscript tree

Comparison of the BEAST analyses under the uncorrelated lognormal clock and 12/28 calibrations with the manuscript tree

BEAST: Fixed local clocks

In the next step, a number of nodes were assigned their own (strict) clocks:

  • Eupercaria (Canthigaster figueiredoi + Cephalopholis argus)
  • Carangaria (Centropomus medius + Alepes kleinii)
  • Ovalenteria (Xenentodon cancila + Emblemariopsis sp.)
  • Pelagiaria (Chiasmodon niger + Psenes cyanophrys)
  • Syngnathiformes (Syngnathus fuscus + Pseudupeneus maculatus)
  • Gobiaria (Kurtus gulliveri + Periophthalmus barbarus)
  • Lepophidium profundorum + Carapus bermudensis
  • Rondeletiidae + Holocentridae (Rondeletia loricata + Myripristis leiognathus)
  • Paracanthopterygii (Lampris guttatus + Gadus morhua)

The objective was to assign a separate clock to each of the six major percomorph subgroups (Eupercaria, Carangaria, Ovalentaria, Pelagiaria, Syngnathiformes, Gobiaria), and then to each of their sister groups that branched off successively closer to the root, ending with the Paracanthopterygii. However, as BEAST is unable to give local clocks to individual tips, only outgroups comprising two or more terminal taxa were assigned their own clocks, while the rates of Batrachoides and Anoplogaster were governed by a common tree-wide clock (set up by the clock.rate parameter) separate from all the local clocks. Both the local clocks as well as the tree-wide clock were strict, and all were assigned the same prior: a lognormal distribution with a mean of 0.005 (in real space), a standard deviation of 1, and an initial value of 0.005.

By default, BEAST assumes that if the fixed local clock model is selected, the subset of nodes that are to be assigned their own clocks is identical to the set of calibrated nodes. This was not the case in the analysis below. However, this problem can be solved in the following way: (1) In the “Taxa” tab of BEAUTi, enforce the monophyly of those nodes which are going to receive their own clocks but not the monophyly of the calibrated nodes. Note that this makes no difference in practice, since the analysis is to be run on a fixed topology, but it does prevent BEAST from assigning local clocks to the calibrated nodes. (2) In the “Priors” tab of BEAUTi, both the calibrated nodes and the “clock-bearing” nodes are expected to receive calibration priors. While these are only available for the former category, the latter category can be effectively freed from the need to specify priors by selecting the Use Tree Prior option in BEAUTi.

All the other settings (substitution models, calibration densities, the fixed topology operator mix) were identical to the uncorrelated lognormal clock analysis described above.

The analysis was run as follows:

/home/linuxbrew/.linuxbrew/bin/beast -beagle_SSE -beagle_scaling dynamic fixed-local-strict-13-calib.xml

The chain reached the target length of 500 million generations after 11.626 days.

The effective samples sizes of individual parameters as calculated by LogAnalyser were as follows:

/home/linuxbrew/.linuxbrew/bin/loganalyser -burnin 50000000 -ess fixed-local-strict-13-calib.log
##                              statistic          median        ESS
## 1                            posterior -49123.84490000  2037.3359
## 2                                prior   -730.66560000   371.6547
## 3                           likelihood -48392.96990000 11889.6246
## 4                 treeModel.rootHeight    142.38790000 14889.8482
## 5                    tmrca(Aipichthys)    141.47640000 16338.0549
## 6                 tmrca(Berybolcensis)     73.45960000 13030.8789
## 7                     tmrca(Calatomus)     33.12950000   864.5692
## 8                    tmrca(Carangaria)     69.71650000   405.1534
## 9           tmrca(Carapus-Lepophidium)     60.37370000 16298.4431
## 10               tmrca(Chaetodontidae)     31.51700000  6679.4499
## 11                tmrca(Eastmanalepes)     49.54820000 12415.8822
## 12                  tmrca(Eobuglossus)     50.93410000  2272.5621
## 13                  tmrca(Eucoelopoma)     55.98260000 10615.3727
## 14                   tmrca(Eupercaria)     71.27540000   142.9556
## 15                     tmrca(Gobiaria)     57.76250000   616.5514
## 16               tmrca(Homonotichthys)    125.84800000 16307.8877
## 17                 tmrca(Mcconichthys)     63.40650000  6719.6232
## 18                         tmrca(Mene)     58.17530000  5287.7434
## 19                  tmrca(Ovalentaria)     58.53990000   926.1633
## 20                   tmrca(Pelagiaria)     73.88320000   954.1835
## 21               tmrca(Ramphexocoetus)     54.27350000  1819.7195
## 22  tmrca(Rondeletiidae-Holocentridae)    105.59100000 11628.5200
## 23                         tmrca(Root)    142.38790000 14889.8482
## 24              tmrca(Syngnathiformes)     67.57600000  1080.1733
## 25                       tmrca(Tarkus)     49.70150000  1083.8428
## 26           tmrca(Paracanthopterygii)    141.47640000 16338.0549
## 27           birthDeath.meanGrowthRate      0.01940000  7300.9057
## 28        birthDeath.relativeDeathRate      0.05580000 17849.9035
## 29                  Subset1HKYGX.kappa      3.23410000 18001.0000
## 30           Subset1HKYGX.frequencies1      0.38410000 17771.2752
## 31           Subset1HKYGX.frequencies2      0.23270000 17483.9830
## 32           Subset1HKYGX.frequencies3      0.15110000 18001.0000
## 33           Subset1HKYGX.frequencies4      0.23090000 16701.7774
## 34                  Subset1HKYGX.alpha      1.04300000 18001.0000
## 35                Subset2TRNEFG.kappa1      4.70270000 16211.6753
## 36                Subset2TRNEFG.kappa2      1.75900000 17135.2751
## 37                 Subset2TRNEFG.alpha      0.68160000 18001.0000
## 38                Subset3TRNEFG.kappa1      3.02350000 17716.5092
## 39                Subset3TRNEFG.kappa2      4.82520000 16905.5261
## 40                 Subset3TRNEFG.alpha      0.47330000 17606.3865
## 41                    Subset4GTRIGX.ac      0.18730000 15377.2684
## 42                    Subset4GTRIGX.ag      0.42460000 17598.3592
## 43                    Subset4GTRIGX.at      0.10000000 16575.0251
## 44                    Subset4GTRIGX.cg      0.36950000 18001.0000
## 45                    Subset4GTRIGX.gt      0.14950000 18001.0000
## 46          Subset4GTRIGX.frequencies1      0.30510000 17825.4731
## 47          Subset4GTRIGX.frequencies2      0.20550000 16197.4361
## 48          Subset4GTRIGX.frequencies3      0.31080000 16847.2357
## 49          Subset4GTRIGX.frequencies4      0.17760000 18001.0000
## 50                 Subset4GTRIGX.alpha      0.45330000 18001.0000
## 51                  Subset5K80IG.kappa      3.52390000 18001.0000
## 52                  Subset5K80IG.alpha      0.32360000 18001.0000
## 53                      Subset6SYMG.ac      0.33140000 17418.7030
## 54                      Subset6SYMG.ag      1.57810000 18001.0000
## 55                      Subset6SYMG.at      0.24350000 18001.0000
## 56                      Subset6SYMG.cg      1.25130000 17269.7941
## 57                      Subset6SYMG.gt      0.26090000 18001.0000
## 58                   Subset6SYMG.alpha      0.52430000 17327.8499
## 59                   Subset7K80G.kappa      2.84290000 16893.2805
## 60                   Subset7K80G.alpha      0.74240000 16946.4298
## 61                   Subset8K80G.kappa      3.13520000 18001.0000
## 62                   Subset8K80G.alpha      1.25100000 16472.9896
## 63                  Subset9HKYGX.kappa      3.53970000 18001.0000
## 64           Subset9HKYGX.frequencies1      0.21280000 18001.0000
## 65           Subset9HKYGX.frequencies2      0.25950000 18001.0000
## 66           Subset9HKYGX.frequencies3      0.30070000 18001.0000
## 67           Subset9HKYGX.frequencies4      0.22630000 18001.0000
## 68                  Subset9HKYGX.alpha      0.64480000 18001.0000
## 69               Subset10TRNEFG.kappa1      4.13090000 18001.0000
## 70               Subset10TRNEFG.kappa2      5.85360000 18001.0000
## 71                Subset10TRNEFG.alpha      0.52020000 17821.6012
## 72                   Subset11JCG.alpha      0.38460000 18001.0000
## 73               Subset12TRNEFG.kappa1      3.02610000 18001.0000
## 74               Subset12TRNEFG.kappa2      6.89270000 18001.0000
## 75                Subset12TRNEFG.alpha      0.35850000 16840.6897
## 76                     Subset1HKYGX.mu      1.14530000 16457.0492
## 77                    Subset2TRNEFG.mu      0.87810000 18001.0000
## 78                    Subset3TRNEFG.mu      0.29580000 18001.0000
## 79                    Subset4GTRIGX.mu      0.73260000 18001.0000
## 80                     Subset5K80IG.mu      0.82460000 18001.0000
## 81                      Subset6SYMG.mu      1.69630000 18001.0000
## 82                      Subset7K80G.mu      1.85100000 18001.0000
## 83                      Subset8K80G.mu      1.06370000 18001.0000
## 84                     Subset9HKYGX.mu      0.59430000 17868.0226
## 85                   Subset10TRNEFG.mu      1.18480000 18001.0000
## 86                      Subset11JCG.mu      0.21330000 18001.0000
## 87                   Subset12TRNEFG.mu      2.42210000 18001.0000
## 88                          clock.rate      0.00059583  6499.0917
## 89                     Carangaria.rate      0.00038146  1707.5095
## 90            Carapus-Lepophidium.rate      0.00296050 17208.2276
## 91                     Eupercaria.rate      0.00063338   250.5997
## 92                       Gobiaria.rate      0.00177240   589.3144
## 93                    Ovalentaria.rate      0.00051243  1231.6271
## 94                     Pelagiaria.rate      0.00018009  2855.7223
## 95    Rondeletiidae-Holocentridae.rate      0.00016488 15539.6245
## 96                Syngnathiformes.rate      0.00146150  1436.4311
## 97             Paracanthopterygii.rate      0.00101200 16619.8225
## 98                            meanRate      0.00068337   633.3408
## 99              coefficientOfVariation      0.62670000   633.9622
## 100                         covariance      0.91110000  1054.6547
## 101        Subset1HKYGX.treeLikelihood  -1961.68430000 11412.9578
## 102       Subset2TRNEFG.treeLikelihood  -3050.01030000 15769.5321
## 103       Subset3TRNEFG.treeLikelihood  -2312.82940000  8179.2027
## 104       Subset4GTRIGX.treeLikelihood  -4909.92280000 12746.3787
## 105        Subset5K80IG.treeLikelihood  -5428.25720000 10132.1210
## 106         Subset6SYMG.treeLikelihood  -5933.73410000 10282.5876
## 107         Subset7K80G.treeLikelihood  -3700.27180000 11552.6385
## 108         Subset8K80G.treeLikelihood  -9197.00450000 12457.3059
## 109        Subset9HKYGX.treeLikelihood  -3783.68050000  8926.9213
## 110      Subset10TRNEFG.treeLikelihood  -4401.77360000  7695.6777
## 111         Subset11JCG.treeLikelihood   -997.18140000 17236.3680
## 112      Subset12TRNEFG.treeLikelihood  -2715.88600000 12128.8444
## 113                        branchRates      0.00000000 25000.0000
## 114                         speciation   -575.32740000   320.6261

Since the ESS values exceeded 200 for all parameters except one (tmrca(Eupercaria)), the maximum clade credibility tree with 95% HPD intervals for all node ages was constructed as a summary of the posterior distribution:

/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 50000000 -heights median fixed-local-strict-13-calib.trees fixed-local-strict-13-calib.tre

This tree was compared to a preliminary tree based on ~253 million generations to see if the estimated node ages (and that of Eupercaria in particular) changed substantially:

Comparison between the preliminary and final fixed local clock trees (13 calibrations)

Comparison between the preliminary and final fixed local clock trees (13 calibrations)

The 29-calibration version of the fixed local clock analysis was run using the following command, with all other settings identical to the 13-calibration analysis described above:

The analysis was run as follows:

/home/linuxbrew/.linuxbrew/bin/beast -beagle_SSE -beagle_scaling dynamic fixed-local-strict-29-calib.xml

The chain stopped when it reached the specificed length after 12.212 days.

The LogAnalyser-generated summary is shown below:

/home/linuxbrew/.linuxbrew/bin/loganalyser -burnin 50000000 -ess fixed-local-strict-29-calib.log
##                              statistic          median        ESS
## 1                            posterior -49254.91010000  2434.0981
## 2                                prior   -802.45410000   343.2358
## 3                           likelihood -48452.15530000  6045.9048
## 4                 treeModel.rootHeight    142.59460000 17076.6986
## 5                    tmrca(Aipichthys)    142.03190000 16781.5027
## 6              tmrca(Archaeotetraodon)     35.35890000   658.4703
## 7                      tmrca(Archaeus)     55.07860000 14127.8210
## 8                 tmrca(Berybolcensis)     71.52740000 13329.2519
## 9                     tmrca(Calatomus)     37.72320000   940.5783
## 10                   tmrca(Carangaria)     73.59020000   489.6684
## 11          tmrca(Carapus-Lepophidium)     59.88640000 14600.4244
## 12               tmrca(Chaetodontidae)     36.07980000  5150.9271
## 13                     tmrca(Cretzeus)    126.86700000 16617.4014
## 14                tmrca(Eastmanalepes)     50.21910000 16540.2405
## 15                     tmrca(Eobothus)     69.64970000   564.8522
## 16                  tmrca(Eobuglossus)     54.53650000  1514.1785
## 17                  tmrca(Eocoelopoma)     56.48070000 15252.7726
## 18                   tmrca(Eupercaria)     77.52990000   186.9551
## 19            tmrca(Gasterorhamphosus)     72.36050000  1547.8148
## 20                     tmrca(Gobiaria)     60.76200000   477.4315
## 21               tmrca(Homonotichthys)    127.35160000 17241.2467
## 22                  tmrca(Hoplopteryx)    121.19720000  3847.2492
## 23                      tmrca(Luvarus)     67.25050000  2017.7517
## 24               tmrca(Mahengechromis)     51.28850000  1832.4278
## 25              tmrca(Massamorichthys)     58.92270000  4292.1376
## 26                         tmrca(Mene)     60.14150000  5179.8827
## 27                  tmrca(Ovalentaria)     61.76030000  1363.4852
## 28                   tmrca(Pelagiaria)     77.59100000   806.6438
## 29                tmrca(Proacanthurus)     49.37680000  8132.0320
## 30              tmrca(Prosolenostomus)     70.48130000  2028.6770
## 31              tmrca(Rhamphexocoetus)     56.62920000  2183.8936
## 32                tmrca(Rhinocephalus)     60.47450000 15964.6785
## 33  tmrca(Rondeletiidae-Holocentridae)    103.62580000 11596.3732
## 34                         tmrca(Root)    142.59460000 17076.6986
## 35                    tmrca(Sphyraena)     67.29320000  2291.1410
## 36              tmrca(Syngnathiformes)     72.36050000  1547.8148
## 37                       tmrca(Tarkus)     52.32670000   567.8049
## 38                 tmrca(Trachipterus)     22.14160000 17507.0383
## 39                      tmrca(Triodon)     66.31820000   211.1453
## 40                     tmrca(Turkmene)     54.92690000  7314.7810
## 41                     tmrca(Zenopsis)     32.30250000  3187.7425
## 42           tmrca(Paracanthopterygii)    142.03190000 16781.5027
## 43           birthDeath.meanGrowthRate      0.01820000  7719.3617
## 44        birthDeath.relativeDeathRate      0.05020000 17687.3529
## 45                  Subset1HKYGX.kappa      3.21390000 18001.0000
## 46           Subset1HKYGX.frequencies1      0.38520000 17217.6627
## 47           Subset1HKYGX.frequencies2      0.23190000 17965.1316
## 48           Subset1HKYGX.frequencies3      0.15100000 18001.0000
## 49           Subset1HKYGX.frequencies4      0.23040000 18001.0000
## 50                  Subset1HKYGX.alpha      1.05020000 18001.0000
## 51                Subset2TRNEFG.kappa1      4.69450000 17363.5582
## 52                Subset2TRNEFG.kappa2      1.76780000 17993.2732
## 53                 Subset2TRNEFG.alpha      0.68610000 17740.4138
## 54                Subset3TRNEFG.kappa1      3.02860000 18001.0000
## 55                Subset3TRNEFG.kappa2      4.82050000 15075.2321
## 56                 Subset3TRNEFG.alpha      0.47340000 17723.5243
## 57                    Subset4GTRIGX.ac      0.18880000 18001.0000
## 58                    Subset4GTRIGX.ag      0.42650000 18001.0000
## 59                    Subset4GTRIGX.at      0.10000000 17599.1877
## 60                    Subset4GTRIGX.cg      0.37270000 17656.1057
## 61                    Subset4GTRIGX.gt      0.14980000 18001.0000
## 62          Subset4GTRIGX.frequencies1      0.30550000 17970.4523
## 63          Subset4GTRIGX.frequencies2      0.20480000 18001.0000
## 64          Subset4GTRIGX.frequencies3      0.31060000 16904.8408
## 65          Subset4GTRIGX.frequencies4      0.17800000 17955.3471
## 66                 Subset4GTRIGX.alpha      0.45400000 17373.4553
## 67                  Subset5K80IG.kappa      3.52170000 17830.6014
## 68                  Subset5K80IG.alpha      0.32450000 18001.0000
## 69                      Subset6SYMG.ac      0.33480000 17984.7861
## 70                      Subset6SYMG.ag      1.58200000 17645.0258
## 71                      Subset6SYMG.at      0.24540000 18001.0000
## 72                      Subset6SYMG.cg      1.24950000 17698.9665
## 73                      Subset6SYMG.gt      0.26100000 18001.0000
## 74                   Subset6SYMG.alpha      0.52600000 17529.0012
## 75                   Subset7K80G.kappa      2.83670000 18001.0000
## 76                   Subset7K80G.alpha      0.74340000 17117.0746
## 77                   Subset8K80G.kappa      3.12880000 17246.0245
## 78                   Subset8K80G.alpha      1.25610000 17487.3237
## 79                  Subset9HKYGX.kappa      3.53310000 17957.8614
## 80           Subset9HKYGX.frequencies1      0.21290000 18001.0000
## 81           Subset9HKYGX.frequencies2      0.25980000 18001.0000
## 82           Subset9HKYGX.frequencies3      0.30090000 18001.0000
## 83           Subset9HKYGX.frequencies4      0.22560000 18001.0000
## 84                  Subset9HKYGX.alpha      0.64580000 16997.3722
## 85               Subset10TRNEFG.kappa1      4.12010000 18001.0000
## 86               Subset10TRNEFG.kappa2      5.82070000 18001.0000
## 87                Subset10TRNEFG.alpha      0.52140000 18001.0000
## 88                   Subset11JCG.alpha      0.38510000 18001.0000
## 89               Subset12TRNEFG.kappa1      3.00570000 17780.5387
## 90               Subset12TRNEFG.kappa2      6.86550000 17692.5054
## 91                Subset12TRNEFG.alpha      0.36060000 17033.8502
## 92                     Subset1HKYGX.mu      1.14460000 18001.0000
## 93                    Subset2TRNEFG.mu      0.87870000 17710.6125
## 94                    Subset3TRNEFG.mu      0.29700000 18001.0000
## 95                    Subset4GTRIGX.mu      0.73490000 17883.7564
## 96                     Subset5K80IG.mu      0.82610000 18001.0000
## 97                      Subset6SYMG.mu      1.69220000 18001.0000
## 98                      Subset7K80G.mu      1.85250000 18001.0000
## 99                      Subset8K80G.mu      1.06560000 17875.7495
## 100                    Subset9HKYGX.mu      0.59680000 16795.9376
## 101                  Subset10TRNEFG.mu      1.18470000 18001.0000
## 102                     Subset11JCG.mu      0.21330000 17663.5206
## 103                  Subset12TRNEFG.mu      2.40260000 18001.0000
## 104                         clock.rate      0.00060673  6822.7059
## 105                    Carangaria.rate      0.00036147  1579.6530
## 106           Carapus-Lepophidium.rate      0.00297940 18001.0000
## 107                    Eupercaria.rate      0.00054166   416.9227
## 108                      Gobiaria.rate      0.00167370   490.1731
## 109                   Ovalentaria.rate      0.00047408  1929.6305
## 110                    Pelagiaria.rate      0.00017121  3161.6316
## 111   Rondeletiidae-Holocentridae.rate      0.00016878 14973.1137
## 112               Syngnathiformes.rate      0.00134710  3713.7027
## 113            Paracanthopterygii.rate      0.00094029 16643.5391
## 114                           meanRate      0.00063772   612.9078
## 115             coefficientOfVariation      0.63710000   590.3531
## 116                         covariance      0.91730000   668.7762
## 117        Subset1HKYGX.treeLikelihood  -1963.34490000 13177.2867
## 118       Subset2TRNEFG.treeLikelihood  -3050.34800000 12504.2159
## 119       Subset3TRNEFG.treeLikelihood  -2316.59780000 14307.2615
## 120       Subset4GTRIGX.treeLikelihood  -4918.48890000 11326.9575
## 121        Subset5K80IG.treeLikelihood  -5439.47450000 11217.2113
## 122         Subset6SYMG.treeLikelihood  -5933.09870000 14748.4056
## 123         Subset7K80G.treeLikelihood  -3702.09060000 13827.7245
## 124         Subset8K80G.treeLikelihood  -9209.33330000 11425.4101
## 125        Subset9HKYGX.treeLikelihood  -3789.74140000 14233.0811
## 126      Subset10TRNEFG.treeLikelihood  -4410.91230000 10183.5781
## 127         Subset11JCG.treeLikelihood   -999.78800000 17085.3841
## 128      Subset12TRNEFG.treeLikelihood  -2717.90640000 13060.8015
## 129                        branchRates      0.00000000 25000.0000
## 130                         speciation   -582.71500000   328.4556

Similar to the 13-calibration analysis, the ESS values of the 29-calibration run also exceeded 200 for all parameters except tmrca(Eupercaria). A summary tree was therefore constructed and compared to a preliminary tree based on the first ~221 million generations:

/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 50000000 -heights median fixed-local-strict-29-calib.trees fixed-local-strict-29-calib.tre
Comparison between the preliminary and final fixed local clock trees (29 calibrations)

Comparison between the preliminary and final fixed local clock trees (29 calibrations)

Comparison of the BEAST analyses under the fixed local clocks and 12/28 calibrations with the manuscript tree

Comparison of the BEAST analyses under the fixed local clocks and 12/28 calibrations with the manuscript tree

MCMCTree analysis

In PAML, subsets of a concatenated alignment cannot be simply ignored as in BEAST but must be removed from the file, so that the sum of partition lengths equals the total number of nucleotides (PAML manual: p. 13). Moreover, the sites that make up a partition must be adjacent in the alignment. Therefore, the following information from the best_scheme file generated by PartitionFinder was used to generate a PAML-compatible alignment:

1 locus42, locus1
2 locus57, locus3, locus25, locus2
3 locus4
4 locus61, locus5, locus55, locus63, locus23, locus47
5 locus16, locus54, locus45, locus31, locus7, locus64, locus6
6 locus15, locus24, locus8, locus11, locus53, locus52, locus51, locus20, locus26
7 locus14, locus17, locus9, locus27, locus22, locus28
8 locus10, locus21, locus13
9 locus62, locus37, locus59, locus29, locus12, locus58, locus46, locus66, locus44, locus18
10 locus40, locus39, locus49, locus19, locus32, locus38
11 locus35, locus60, locus33, locus43, locus30
12 locus34
13 locus36, locus48, locus65
14 locus56, locus50, locus41
  1. Using bash, replace whitespaces between each taxon name and the corresponding sequence with line breaks:

    xargs -n 1 < concat.phy > concat2.phy
  2. Manually remove the first line indicating the number of taxa and sites.

  3. Now, use the structure of the new file (with names and sequences in alternating rows) to reorganize the chunks:

    concat <- read.table("/Users/David/Downloads/concat2.phy", stringsAsFactors = FALSE)
    
    # Random check, part 1: print "locus 42" (sites 2051--2100) of the first taxon:
    substring(concat[2,], 2051, 2100)
    
    for(i in seq(2, nrow(concat), by = 2)) {
      chunks <- substring(concat[i,], seq(1, 3250, 50), seq(50, 3250, 50))
      chunks[66] <- substring(concat[i,], 3251, 3297)
      ch1 <- paste(chunks[42], chunks[1], sep = "")
      ch2 <- paste(chunks[57], chunks[3], chunks[25], chunks[2], sep = "")
      ch3 <- paste(chunks[61], chunks[5], chunks[55], chunks[63], chunks[23], chunks[47], sep = "")
      ch4 <- paste(chunks[16], chunks[54], chunks[45], chunks[31], chunks[7], chunks[64], chunks[6], sep = "")
      ch5 <- paste(chunks[15], chunks[24], chunks[8], chunks[11], chunks[53], chunks[52], chunks[51], chunks[20], chunks[26], sep = "")
      ch6 <- paste(chunks[14], chunks[17], chunks[9], chunks[27], chunks[22], chunks[28], sep = "")
      ch7 <- paste(chunks[10], chunks[21], chunks[13], sep = "")
      ch8 <- paste(chunks[62], chunks[37], chunks[59], chunks[29], chunks[12], chunks[58], chunks[46], chunks[66], chunks[44], chunks[18], sep = "")
      ch9 <- paste(chunks[40], chunks[39], chunks[49], chunks[19], chunks[32], chunks[38], sep = "")
      ch10 <- paste(chunks[35], chunks[60], chunks[33], chunks[43], chunks[30], sep = "")
      ch11 <- paste(chunks[36], chunks[48], chunks[65], sep = "")
      ch12 <- paste(chunks[56], chunks[50], chunks[41], sep = "")
      concat[i,] <- paste(ch1, ch2, ch3, ch4, ch5, ch6, ch7, ch8, ch9, ch10, ch11, ch12, sep = "")
    }
    
    # Make sure that the new sequence rows have the desired length:
    for(i in seq(2, nrow(concat), by = 2)) {
      print(nchar(concat[i,]))
    }
    
    # Random check, part 2: "locus 42" should now correspond to sites 1--50. Does it?
    substring(concat[2,], 1, 50)
    
    # Yes! Now print the new alignment into a table:
    write.table(concat,
      "/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Concat/pamlconcat.phy",
      quote = FALSE,
      row.names = FALSE,
      col.names = FALSE)
  4. Add the following two lines to the beginning of the file:

    118 3197 G
    G 12 100 200 300 350 450 300 150 497 300 250 150 150

Unlike BEAST, PAML cannot assign a separate substitution model to each partition, but it is capable of unlinking substitution model parameters across partitions (Warnock et al. 2014: ESM p. 2). Since moderate substitution model overparameterization usually does not pose a problem to Bayesian phylogenetic analyses (Ronquist & Deans 2010), each partition was assigned its own GTR+\(\Gamma\) (“REV”) model. Note that the unlinking of substitution models necessitates the use of empirical (nhomo = 0) rather than estimated base frequencies. Relative rate, equilibrium frequency, and alpha parameters were unlinked across partitions, but branch lengths were kept linked (options Mgene = 4 and Malpha = 1). To facilitate cross-platform comparisons, the root calibration was set to 120.5 Ma (same as the mean of the root prior used in BEAST) in baseml to calculate substitution rates.

The substitution rate estimation finished up in 5:46:48 and yielded the following values:

Partition Rate (subst. per 100 million years)
Gene 1 0.126608
Gene 2 0.087695
Gene 3 0.033200
Gene 4 0.072678
Gene 5 0.072362
Gene 6 0.158487
Gene 7 0.188372
Gene 8 0.107932
Gene 9 0.060550
Gene 10 0.111402
Gene 11 0.026772
Gene 12 0.184536

For MCMCTree, the G option cannot be used, and the partitions must be given as multiple alignments one after another in a single file (http://groups.google.com/forum/#!topic/pamlsoftware/cC7mOgZnNiY). Such a file was compiled as follows:

  1. Run the following code:

    concat <- read.table("/Users/David/Downloads/concat2.phy", stringsAsFactors = FALSE)
    
    alignments <- matrix(nrow = 12*(nrow(concat) + 2), ncol = 1)
    for(i in 1:(nrow(concat)/2)) {
        for(j in 0:11) {
            alignments[(2*i-1) + j*(nrow(concat) + 2), ] <- concat[(2*i-1),]
        }
      chunks <- substring(concat[2*i,], seq(1, 3250, 50), seq(50, 3250, 50))
      chunks[66] <- substring(concat[2*i,], 3251, 3297)
      alignments[2*i,] <- paste(chunks[42], chunks[1], sep = "")
      alignments[2*i + 1*(nrow(concat) + 2),] <- paste(chunks[57], chunks[3], chunks[25], chunks[2], sep = "")
      alignments[2*i + 2*(nrow(concat) + 2),] <- paste(chunks[61], chunks[5], chunks[55], chunks[63], chunks[23], chunks[47], sep = "")
      alignments[2*i + 3*(nrow(concat) + 2),] <- paste(chunks[16], chunks[54], chunks[45], chunks[31], chunks[7], chunks[64], chunks[6], sep = "")
      alignments[2*i + 4*(nrow(concat) + 2),] <- paste(chunks[15], chunks[24], chunks[8], chunks[11], chunks[53], chunks[52], chunks[51], chunks[20], chunks[26], sep = "")
      alignments[2*i + 5*(nrow(concat) + 2),] <- paste(chunks[14], chunks[17], chunks[9], chunks[27], chunks[22], chunks[28], sep = "")
      alignments[2*i + 6*(nrow(concat) + 2),] <- paste(chunks[10], chunks[21], chunks[13], sep = "")
      alignments[2*i + 7*(nrow(concat) + 2),] <- paste(chunks[62], chunks[37], chunks[59], chunks[29], chunks[12], chunks[58], chunks[46], chunks[66], chunks[44], chunks[18], sep = "")
      alignments[2*i + 8*(nrow(concat) + 2),] <- paste(chunks[40], chunks[39], chunks[49], chunks[19], chunks[32], chunks[38], sep = "")
      alignments[2*i + 9*(nrow(concat) + 2),] <- paste(chunks[35], chunks[60], chunks[33], chunks[43], chunks[30], sep = "")
      alignments[2*i + 10*(nrow(concat) + 2),] <- paste(chunks[36], chunks[48], chunks[65], sep = "")
      alignments[2*i + 11*(nrow(concat) + 2),] <- paste(chunks[56], chunks[50], chunks[41], sep = "")
    }
    write.table(alignments, "/Users/David/Downloads/pamlconcatpartitioned.phy", quote = FALSE, row.names = FALSE, col.names = FALSE, na = "")
  2. Manually add the information about the number of taxa and sites in each partition.

To obtain a single estimate that could be used for the gamma-Dirichlet hyperprior on rates (rgene_gamma), a weighted average of these values was computed, with each rate weighted by the length of the corresponding partition:

## [1] 0.09550072

The shape parameter of the gamma-Dirichlet distribution (\(\alpha\)) was set to 2 and the rate parameter (\(\beta\)) was chosen so that the mean (calculated as \(\frac{\alpha}{\beta}\)) was equal to the rate above (expressed as the number of substitutions per 10 million years):

## [1] 209.4225

The hyperprior on the mean of the rate distribution is distributed as follows:

The mean of the gamma hyperprior on the variance of the log rate was set to 0.1 by setting \(\alpha\) equal to 1 and \(\beta\) equal to 10. This corresponds closely to the mean of the ucld.stdev hyperprior in BEAST (0.3 – note that while BEAST places the prior on the standard deviation of the rate distribution, MCMCTree assigns the prior to the variance, or the square of the standard deviation).

The lognormal distribution of rates is plotted below, with the mean and variance (in log-space) set equal to the means of the respective hyperpriors:

Finally, since there are multiple loci, the Dirichlet concentration parameter \(\alpha_D\) was specified and set to the default value of 1, which is described as producing “a reasonable partitioning” in the MCMCTree manual.

The full configuration file is shown below:

          seed = -1
       seqfile = pamlconcatpartitioned.phy
      treefile = 12_cali_no_outgroups_corrected.tre
       outfile = chunks.txt

         ndata = 12                            * Number of partitions
       seqtype = 0                             * Data type: nucleotides
       usedata = 3                             * Store the Hessian matrix for approximate likelihood computation in out.BV
         clock = 2                             * Uncorrelated lognormal relaxed clock
       RootAge = 'B(9.8, 14.3, 1e-300, 0.05)'  * P of less than 98 Ma = 10^(-300) and P of more than 143 Ma = 0.05

         model = 7                             * GTR
         alpha = 0.1                           * Following Alfaro et al.
         ncatG = 8                             * Following Alfaro et al.

     cleandata = 0                             * Do not remove sites with ambiguity

       BDparas = 0.1 0.1 0.01                  * Birth, death, sampling: following Alfaro et al.
   kappa_gamma = 6 2                           * No effect since usedata = 3
   alpha_gamma = 1 1                           * No effect since usedata = 3

   rgene_gamma = 2 209.42 1                    * Gamma-Dirichlet prior on mean rate: estimated using baseml under strict clock
  sigma2_gamma = 1 10 1                        * Gamma-Dirichlet prior on log rate variance

      finetune = 1: .1 .1 .1 .1 .1 .1          * Auto finetune: times, musigma2, rates, mixing, paras, FossilErr

         print = 2                             * Print branch rates into an output file
        burnin = 500000                        * Following Alfaro et al.
      sampfreq = 500                           * Following Alfaro et al.
       nsample = 15000                         * Following Alfaro et al.

The initial analyses (run under the usedata = 3 option to calculate the Hessian matrices for the branch lengths) produced warning messages about branch lengths close to zero, but these did not cause baseml to crash, and the Hessian matrices were successfully written into out.BV files. Therefore, the usedata variable was set to 2, the out.BV files were moved to in.BV, and 6 separate MCMC chains with a length of 75 million generations (as specified in the configuration file above) were started:

Chain Time tree analysis
paml-sortadate 549:59:46
paml-sortadate-repl1 544:27:03
paml-sortadate-repl2 549:12:20
paml-sortadate-repl3 551:16:33
paml-sortadate-repl4 497:28:09
paml-sortadate-repl5 502:41:18
Comparison with the MCMCTree analysis of the same dataset and with the manuscript tree (based on an MCMCTree analysis of the 4-partition dataset)

Comparison with the MCMCTree analysis of the same dataset and with the manuscript tree (based on an MCMCTree analysis of the 4-partition dataset)

Extended calibration set

The analysis was run under settings identical to those described above, except that the tree used contained 28 calibrations instead of 12. The chain finished up after 505:53:29 with the following results:

MCMCTree with the extended calibration set and uncorrelated rates vs. the manuscript tree

MCMCTree with the extended calibration set and uncorrelated rates vs. the manuscript tree

Autocorrelation

Again, the settings were identical to the original analysis except that the clock variable was set to 3 (autocorrelated rates) rather than 2 (independent rates) in the configuration file. The analysis reached the target number of samples after 594:27:56, with the posterior distribution was automatically summarized into the following tree:

MCMCTree with aucorrelated rates vs. the manuscript tree

MCMCTree with aucorrelated rates vs. the manuscript tree

Extended calibration set with autocorrelation

This set-up was a combination of the two previous analyses, with the 28-calibration set and the autocorrelated lognormal relaxed clock model. After 603:04:28, the MCMC sampler collected the specified number of trees, which were then summarized as follows:

MCMCTree with the extended calibration set and autocorrelated rates vs. the manuscript tree

MCMCTree with the extended calibration set and autocorrelated rates vs. the manuscript tree

PhyloBayes analysis

First, the hyperparameters of the birth-death process were estimated by running PhyloBayes under a topological constraint (without branch lengths or calibrations). PhyloBayes cannot assign different substitution models to different partitions, nor can it (unlike MCMCTree) unlink model parameters across partitions. However, the following options are available:

  • -gbl flag: give a separate set of branch lengths to each user-specified partition
  • -dp/-cat model: infinite mixture of equilibrium frequency profiles
  • -qmm model: infinite mixture of Q-matrices (i.e., of both equilibrium frequencies and exchangeabilities)

Since a node-dating analysis estimates rates and times separately instead of branch lengths, and since the -cat and -qmm options are very time-intensive and their use can lead to converge problems, the SortaDate alignment was analyzed as a single unpartitioned dataset. As there were no partitions with separate models, the risk of overparameterization was eliminated, and the full dataset (length = 3297 bp) was therefore analyzed instead of the one used for BEAST and PAML, which excluded the two shortest partitions identified by PartitionFinder (length = 3197 bp). The chain length was set to 25,000 generations, which was expected to be sufficient for reaching convergence based on the results of previous analyses.

../pb -d concat.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 -x 1 25000 sortadate-nocal

The chain finished up with the following autocorrelation times and effective sample sizes:

pbNocal <- read.table("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Concat/PhyloBayes/nocal-tracer-info.txt", header = FALSE, sep = "\t")
pbNocalTrace <- read.csv("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Concat/PhyloBayes/sortadate-nocal.trace", sep = "\t")
colnames(pbNocal)[2:16] <- colnames(pbNocalTrace)[-c(1:3)]
t(pbNocal[c(10,11),])
##           10                            11                           
## V1        "auto-correlation time (ACT)" "effective sample size (ESS)"
## loglik    "1.597"                       "14088.253"                  
## length    "9.0591"                      "2483.5769"                  
## sigma     "1.5157"                      "14844.2411"                 
## mu        "11.7203"                     "1919.6668"                  
## meanrates "4.9723"                      "4524.8341"                  
## p1        "2.5015"                      "8994.0888"                  
## p2        "6.4697"                      "3477.601"                   
## alpha     "1.5033"                      "14966.6127"                 
## nmode     "n/a"                         "n/a"                        
## stat      "6.9677"                      "3229.0588"                  
## statalpha "n/a"                         "n/a"                        
## rrent     "2.2523"                      "9989.3947"                  
## meanrr    "11.1859"                     "2011.3664"                  
## kappa     "n/a"                         "n/a"                        
## allocent  "n/a"                         "n/a"

Based on these values, the chain was summarized using readdiv under the following settings:

../readdiv -x 2500 20 sortadate-nocal

Birth-death prior hyperparameters:

p1 p2
0.00132097 +/- 0.00115672 0.000782745 +/- 0.00034246

These values were used to perform the calibrated analyses (under both the 12-calibration and 28-calibration datasets). Again, the values supplied to the -x flag were chosen based on the previous analyses of comparably large datasets:

../pb -d concat.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00132097 0.000782745 -cal calib_root_corrected -sb -gtr -dgam 4 -x 1 20000 sortadate-12cal
../pb -d concat.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00132097 0.000782745 -cal extendcorrectcalib -sb -gtr -dgam 4 -x 1 20000 sortadate-28cal

The analyses finished up with the following ESS values and autocorrelation periods:

pb12cal <- read.table("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Concat/PhyloBayes/12cal-tracer-info.txt", header = FALSE, sep = "\t")
pb12calTrace <- read.csv("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Concat/PhyloBayes/sortadate-12cal.trace", sep = "\t")
colnames(pb12cal)[2:17] <- colnames(pb12calTrace)[-c(1:3)]
t(pb12cal[c(10,11),])
##           10                            11                           
## V1        "auto-correlation time (ACT)" "effective sample size (ESS)"
## loglik    "1.7202"                      "10463.3747"                 
## length    "8.7058"                      "2067.4797"                  
## sigma     "1.4565"                      "12357.6514"                 
## mu        "9.5617"                      "1882.4052"                  
## meanrates "3.4236"                      "5257.3092"                  
## p1        "n/a"                         "n/a"                        
## p2        "n/a"                         "n/a"                        
## scale     "3.7748"                      "4768.188"                   
## alpha     "1.485"                       "12120.3981"                 
## nmode     "n/a"                         "n/a"                        
## stat      "7.1731"                      "2509.2251"                  
## statalpha "n/a"                         "n/a"                        
## rrent     "2.2276"                      "8080.1737"                  
## meanrr    "12.1996"                     "1475.3767"                  
## kappa     "n/a"                         "n/a"                        
## allocent  "n/a"                         "n/a"
pb28cal <- read.table("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Concat/PhyloBayes/28cal-tracer-info.txt", header = FALSE, sep = "\t")
pb28calTrace <- read.csv("/Users/David/Grive/Alfaro_Lab/Acanthomorpha/sate-gblocks-clean-min-90-taxa/Concat/PhyloBayes/sortadate-28cal.trace", sep = "\t")
colnames(pb28cal)[2:17] <- colnames(pb28calTrace)[-c(1:3)]
t(pb28cal[c(10,11),])
##           10                            11                           
## V1        "auto-correlation time (ACT)" "effective sample size (ESS)"
## loglik    "1.7064"                      "10547.9241"                 
## length    "7.5954"                      "2369.7149"                  
## sigma     "1.6232"                      "11088.572"                  
## mu        "11.2716"                     "1596.8479"                  
## meanrates "4.0612"                      "4431.96"                    
## p1        "n/a"                         "n/a"                        
## p2        "n/a"                         "n/a"                        
## scale     "8.9322"                      "2015.0681"                  
## alpha     "1.4647"                      "12288.4722"                 
## nmode     "n/a"                         "n/a"                        
## stat      "5.7487"                      "3130.9447"                  
## statalpha "n/a"                         "n/a"                        
## rrent     "2.4083"                      "7473.7659"                  
## meanrr    "14.0699"                     "1279.2564"                  
## kappa     "n/a"                         "n/a"                        
## allocent  "n/a"                         "n/a"

The chains were summarized as follows:

../readdiv -x 2000 20 sortadate-12cal
../readdiv -x 2000 20 sortadate-28cal

12-calibration set overflows (italics) and underflows (bold):

Calibration Lower bound Upper bound Node number Mean
Root 98 143 118 138.664
Aipichthys 98 128.82 119 126.112
Berybolcensis 49 109.29 132 79.9844
Calatomus 11.9 43.95 210 28.5936
Chaetodontidae 29.62 59.26 220 57.8165
Eastmanalepes 49 61.61 195 54.998
Eobuglossus 41.2 53.88 189 45.4263
Eocoelopoma 54.17 95.58 162 69.313
Homonotichthys 93.9 116.35 123 115.312
Mcconichthys 63.1 93.51 124 57.9158
Mene 55.2 95.64 192 71.7266
Rhamphexocoetus 49 80.52 173 74.3516
Tarkus 49 53.93 229 50.5079
Calibration Lower bound Upper bound Node number rcluster1ext
Root 98.0 143.0 118 132.175
Aipichthys 98.0 143.0 119 126.049
Archaeotetraodon 32.02 58.5 231 32.2761
Archaeus 54.17 80.8 193 62.01
Berybolcensis 49.0 93.7 132 62.4285
Calatomus 11.9 63.3 210 30.9777
Chaetodontidae 29.62 66.0 220 65.7377
Cretzeus 69.71 111.3 125 106.648
Eastmanalepes 49.0 69.3 195 57.4984
Eobothus 49.0 79.7 186 69.6748
Eobuglossus 41.2 66.6 189 47.3793
Eocoelopoma 54.17 94.2 162 66.8977
Gasterorhamphosus 69.71 111.3 154 84.51
Homonotichthys 93.9 116.35 123 115.207
Hoplopteryx 93.9 128.0 129 125.03
Luvarus 54.17 80.8 219 78.8816
Mahengechromis 45.46 67.4 174 60.6318
Massamorichthys 58.551 109.6 124 48.5732
Mene 55.2 94.6 192 71.5182
Proacanthurus 49.0 69.3 222 52.0353
Prosolenostomus 49.0 93.7 155 78.3679
Rhamphexocoetus 49.0 79.7 173 73.5149
Rhinocephalus 53.7 94.4 128 54.059
Sphyraena 49.0 79.7 183 78.0494
Tarkus 49.0 61.6 229 50.4162
Trachipterus 5.33 94.0 121 25.6576
Triodon 50.5 69.6 230 64.0869
Turkmene 54.17 122.5 120 57.4098
Zenopsis 32.02 91.4 126 39.0751

More detailed information on underflows and overflows is available in the respective _sample.calib files.

Comparisons of the PhyloBayes, MCMCTree, and BEAST analyses of the SortaDate alignment under the 12-calibration set with the manuscript tree

Comparisons of the PhyloBayes, MCMCTree, and BEAST analyses of the SortaDate alignment under the 12-calibration set with the manuscript tree

Comparisons of the PhyloBayes, MCMCTree, and BEAST analyses of the SortaDate alignment under the 28-calibration set with the manuscript tree

Comparisons of the PhyloBayes, MCMCTree, and BEAST analyses of the SortaDate alignment under the 28-calibration set with the manuscript tree

Test for autocorrelation in PhyloBayes

Rationale: in order to perform Bayes factor model comparisons in PhyloBayes, it is necessary to first run the estbranches program from the Multidistribute package to calculate a variance-covariance matrix for branch lengths. As its input, estbranches requires a substitution model with fixed-value parameters (i.e., the parameters of the model are not estimated simultaneously with the branch lengths). A properly formatted model file can be generated from baseml output using one of the scripts in the package.

  1. Compile paml2modelinf.c from the Multidistribute package v 9/25/03:

    cc paml2modelinf.c arrayutl.c -lm -O -o paml2modelinf
  2. Run baseml on an unpartitioned version of the SortaDate alignment. This was done using a fixed tree topology with a single root calibration at 120.5 Ma (equal to the mean of the uniform distribution with support from 98 to 143 Ma) and the F84 substitution model, which is required by estbranches.

PhyloBayes uses Bayes factors with marginal likelihoods computed using thermodynamic integration (path sampling) for model comparisons. All comparisons are between the user’s model of choice and the default, unconstrained model, whose prior is defined for unrooted trees (Lepage et al. 2007). The commands below were used to perform comparisons between this model and (1) the autocorrelared lognormal model as well as (2) the uncorrelated gamma model, which is the closest equivalent of the uncorrelated models implemented in BEAST (the uncorrelated exponential model offered in BEAUTi is its special case).

Note that the .oes file used for model comparison has the following components:

  1. topology without tip names but with (what appears to be) branch lengths
  2. list of tip names and numbers
  3. list of parent-child triplets (with tips represented by their numbers)
  4. variance-covariance matrix

According to the manual, the variance-covariance matrix can be obtained using the estbranches program from the multidistribute package. The PAML manual states that MCMCTree under the usedata = 3 setting performs the same function. Since PhyloBayes does not support partitioning, a new PAML analysis was run on an unpartitioned version of the dataset but with otherwise identical settings. However, the PAML-generated matrix was not readable by PhyloBayes, and estbranches was used to calculate the matrix. The tests were run as follows:

../bf -cov chunkbrlen.oes -long -ln chunklnbf
../bf -cov chunkbrlen.oes -long -ugam chunkugambf
  • Uncorrelated gamma vs unconstrained: logBF = 36.6282 [36.4131, 41.7097]
  • Autocorrelated lognormal vs. unconstrained: logBF = 77.3395 [59.7214, 78.3891]

Tip dating in BEAST2

BEAST2 was installed on the ‘yog-sothoth’ cluster, with the Sampled Ancestors package added via command line:

java -cp beast.jar beast.util.AddOnManager -add SA

The substitution models used for the individual partitions were the same for the tip-dating analyses as for the node-dating analyses above, following the optimum scheme recovered by PartitionFinder. However, different priors were placed on their parameters following the recommendations of Heath (2016). The only free transition rate parameter (rateAG) was assigned a beta prior with \(\alpha = 2\) and \(\beta = 0.5\); all transversion parameters were assigned a beta prior with \(\alpha\) set to 2 and \(\beta\) to 0.25. The choice of the rate model (uncorrelated lognormal clock) as well as the hyperpriors placed on its parameters were the same as in the initial node-dating analysis.

The fossilized birth-death (FBD) tree prior was condition on the age of the root, following the assumption that all fossils included in the analysis are descendants of the most recent common ancestor of all living acanthomorphs. As recommended by Heath (2016), an exponential hyperprior with a mean of 1 was used for the diversification rate. The sampling proportion was assigned a beta hyperprior with \(\alpha = \beta =\) 2, and the default hyperprior was placed on the turnover hyperparameter. The proportion of sampled extant species was fixed to 0.01, since the 118 species represented in the matrix by their sequence data represent ~1% of the total diversity of extant acanthomorphs.

13 taxon sets were created using BEAUti 2.4.6. Twelve of these corresponded to the internal calibrations used in the node-dating analyses, and their monophyly was enforced without providing any prior distribution. Each fossil was treated as a member of the same crown clade to whose MRCA it was assigned in the node-dating analyses. To ensure that our monophyly constraints were not mutually contradictory, we enforced the membership of some fossils in multiple taxon sets when the latter were nested within one another. As an example, Mcconichthys was constrained to be a member not only of the clade originating with the most recent common ancestor of Percopsis omiscomaycus and Aphredoderus sayanus, but also of the more inclusive clade originating with the MRCA of Percopsis omiscomaycus and Polymixia lowei. One more taxon set was created for all tips included in the analysis (including fossils) and assigned a uniform prior with support on (98 Ma, 143 Ma).

We used a random starting tree, whose root height was set to a deliberately low and arbitrary value of 0.1 after several failed attempts to initialize the analysis with a higher tree. The MCMC chain was run for 500 million generations, with a sampling period of 25,000 generations:

java -jar lib/beast.jar -beagle_SSE -threads 16 12-fossil-tip-dating-random-start.xml

The effective samples sizes of each parameter were determined using LogAnalyser:

/home/linuxbrew/.linuxbrew/bin/loganalyser -burnin 50000000 -ess 12-fossil-tip-dating.log

Since all the ESS values exceeded 200 samples, the posterior distribution of topologies and divergence times was summarized using TreeAnnotator:

/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 50000000 -heights median beast_tree.trees 12-fossil-tip-dating.tre
BEAST2 tip dating tree based on 12 calibrations

BEAST2 tip dating tree based on 12 calibrations

Similarly, an extended analysis with exactly the same setting and 16 additional taxon sets was started as follows:

java -jar lib/beast.jar -beagle_SSE -threads 16 28-fossil-tip-dating-random-start.xml

After the 28-fossil analysis finished up, LogAnalyser was used to calculate the effective sample sizes of individual parameters.

/home/linuxbrew/.linuxbrew/bin/loganalyser -burnin 50000000 -ess 28-fossil-tip-dating.log

Since all of these were substantially greater than 200 samples, TreeAnnotator was utilized to find the maximum clade credibility tree:

/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 50000000 -heights median tree.trees 28-fossil-tip-dating.tre
BEAST2 tip dating tree based on 28 calibrations

BEAST2 tip dating tree based on 28 calibrations

Uniform root prior

In the analyses above, the root age was only constrained by the FBD branching prior, which led to an unrealistically old posterior distribution. Therefore, two new analyses were started for the 12-fossil and 28-fossil datasets: in both cases, the settings were identical to the unconstrained analyses, except that a uniform prior from 98 to 143 Ma was placed on the age of the root. The 12-fossil analysis finished up first, with the following ESS values determined using LogAnalyser:

##                          statistic             median        ESS
## 1                        posterior -47233.90380000000  1706.2546
## 2                       likelihood -46548.22540000000  9172.7027
## 3                            prior   -685.73760000000   483.4322
## 4          treeLikelihood.10TRNEFG  -4257.98840000000 11463.4217
## 5             treeLikelihood.11JCG   -952.21320000000  6661.2970
## 6          treeLikelihood.12TRNEFG  -2622.50970000000  8509.8306
## 7            treeLikelihood.1HKYGX  -1890.52190000000  8109.2871
## 8           treeLikelihood.2TRNEFG  -2929.69250000000 12088.6246
## 9           treeLikelihood.3TRNEFG  -2228.74940000000  5769.0429
## 10          treeLikelihood.4GTRIGX  -4750.89330000000  7710.1689
## 11           treeLikelihood.5K80IG  -5272.65440000000  4833.8792
## 12            treeLikelihood.6SYMG  -5716.05980000000  4477.4851
## 13            treeLikelihood.7K80G  -3532.66970000000 10746.3054
## 14            treeLikelihood.8K80G  -8745.95340000000  8319.6953
## 15           treeLikelihood.9HKYGX  -3647.72100000000 11876.1048
## 16                      TreeHeight    139.77870000000  5953.6113
## 17           mutationRate.10TRNEFG      1.19620000000 17751.5836
## 18              mutationRate.11JCG      0.21220000000 17163.8262
## 19           mutationRate.12TRNEFG      2.53440000000 14910.8342
## 20             mutationRate.1HKYGX      1.19170000000 17756.3222
## 21            mutationRate.2TRNEFG      0.87620000000 16565.2591
## 22            mutationRate.3TRNEFG      0.29040000000 17528.7900
## 23            mutationRate.4GTRIGX      0.72680000000 16347.8897
## 24             mutationRate.5K80IG      0.79540000000 16647.5726
## 25              mutationRate.6SYMG      1.63040000000 17751.1299
## 26              mutationRate.7K80G      1.92710000000 16612.1757
## 27              mutationRate.8K80G      1.06120000000 17515.8853
## 28             mutationRate.9HKYGX      0.59890000000 16498.8882
## 29             gammaShape.10TRNEFG      0.50790000000 17754.0383
## 30                gammaShape.11JCG      0.39560000000 17928.0283
## 31             gammaShape.12TRNEFG      0.34560000000 16471.5577
## 32               gammaShape.1HKYGX      1.02500000000 16984.0140
## 33              gammaShape.2TRNEFG      0.68250000000 17221.0581
## 34              gammaShape.3TRNEFG      0.47450000000 15774.3157
## 35              gammaShape.4GTRIGX      0.45080000000 16374.5123
## 36               gammaShape.5K80IG      0.32290000000 16355.4098
## 37                gammaShape.6SYMG      0.52880000000 16727.5002
## 38                gammaShape.7K80G      0.71780000000 17966.0403
## 39                gammaShape.8K80G      1.23690000000 16757.8193
## 40               gammaShape.9HKYGX      0.63300000000 17424.3099
## 41                    kappa.1HKYGX      3.28860000000 16755.5928
## 42                    kappa.5K80IG      3.62050000000 17314.6747
## 43                     kappa.7K80G      2.93870000000 16837.1018
## 44                     kappa.8K80G      3.17730000000 16537.0922
## 45                    kappa.9HKYGX      3.59850000000 17149.5572
## 46                 kappa1.10TRNEFG      4.15130000000 15666.2777
## 47                 kappa1.12TRNEFG      3.09650000000 17554.8172
## 48                  kappa1.2TRNEFG      4.80560000000 17133.7936
## 49                  kappa1.3TRNEFG      3.05300000000 17542.6637
## 50                 kappa2.10TRNEFG      6.08630000000 17089.8513
## 51                 kappa2.12TRNEFG      7.39660000000 16966.4610
## 52                  kappa2.2TRNEFG      1.80010000000 16456.2555
## 53                  kappa2.3TRNEFG      4.86330000000 17106.4993
## 54                  rateAC.4GTRIGX      0.19230000000  9202.4514
## 55                    rateAC.6SYMG      0.31830000000 14772.2893
## 56                  rateAG.4GTRIGX      0.46970000000  5187.6089
## 57                    rateAG.6SYMG      1.59290000000 12224.3756
## 58                  rateAT.4GTRIGX      0.11760000000 10120.5998
## 59                    rateAT.6SYMG      0.25090000000 13441.4208
## 60                  rateCG.4GTRIGX      0.37040000000  7796.0882
## 61                    rateCG.6SYMG      1.21630000000 14521.5011
## 62                  rateGT.4GTRIGX      0.16390000000  9184.8305
## 63                    rateGT.6SYMG      0.24450000000 16356.3337
## 64                        ucldMean      0.00058720000   289.8192
## 65                       ucldStdev      0.98700000000  4290.0365
## 66                       rate.mean      0.00058056000   438.4319
## 67                   rate.variance      0.00000075588   702.4819
## 68     rate.coefficientOfVariation      1.34190000000   821.8079
## 69                             FBD   -636.12510000000   474.4917
## 70          diversificationRateFBD      0.04190000000  1806.5426
## 71                     turnoverFBD      0.89010000000   865.3903
## 72           samplingProportionFBD      0.00015135000  9783.4961
## 73                      SACountFBD      1.00000000000 11172.5077
## 74             logP(mrca(rootAge))      0.00000000000 25000.0000
## 75               mrcatime(rootAge)    139.77870000000  5953.6113
## 76      mrcatime(Aipichthys_clade)    120.08030000000   359.2417
## 77   mrcatime(Berybolcensis_clade)     71.73470000000  4987.6096
## 78       mrcatime(Calatomus_clade)     46.45260000000   896.2592
## 79  mrcatime(Chaetodontidae_clade)     64.44070000000  1386.7557
## 80   mrcatime(Eastmanalepes_clade)     60.62820000000  1354.0829
## 81     mrcatime(Eobuglossus_clade)     55.82730000000  1318.1157
## 82     mrcatime(Eocoelopoma_clade)     58.66390000000  2671.5243
## 83  mrcatime(Homonotichthys_clade)    102.77840000000   604.9902
## 84    mrcatime(Mcconichthys_clade)     75.56220000000  3057.0961
## 85            mrcatime(Mene_clade)     69.57500000000  1337.8064
## 86 mrcatime(Rhamphexocoetus_clade)     68.59380000000   789.5581
## 87          mrcatime(Tarkus_clade)     55.90610000000  1068.7634
## 88           freqParameter.1HKYGX1      0.38210000000 12019.0077
## 89           freqParameter.1HKYGX2      0.23080000000 14195.2672
## 90           freqParameter.1HKYGX3      0.15190000000 13161.9959
## 91           freqParameter.1HKYGX4      0.23330000000 13520.8211
## 92          freqParameter.4GTRIGX1      0.29820000000  7938.6371
## 93          freqParameter.4GTRIGX2      0.21850000000  5642.5622
## 94          freqParameter.4GTRIGX3      0.30920000000  7189.5888
## 95          freqParameter.4GTRIGX4      0.17270000000  7919.8622
## 96           freqParameter.9HKYGX1      0.20990000000 14105.1964
## 97           freqParameter.9HKYGX2      0.25860000000 14289.8511
## 98           freqParameter.9HKYGX3      0.30540000000 12052.0022
## 99           freqParameter.9HKYGX4      0.22500000000 13320.3506

Since all parameters had effective sample sizes of 200 or greater, the posterior was summarized using TreeAnnotator:

/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 50000000 -heights median 12-fossil-tip-dating-unif-root-prior.trees 12-fossil-tip-dating-unif-root-prior.tre
BEAST2 tip dating tree based on 12 calibrations and a uniform root prior

BEAST2 tip dating tree based on 12 calibrations and a uniform root prior

After the 28-fossil analysis reached the target length of 500 million generations, LogAnalyser was used to compute the following ESS values:

##                             statistic             median        ESS
## 1                           posterior -47391.52040000000  2823.8502
## 2                          likelihood -46549.77510000000  8061.0076
## 3                               prior   -841.67180000000  1238.5705
## 4             treeLikelihood.10TRNEFG  -4258.35840000000 11191.4818
## 5                treeLikelihood.11JCG   -951.86050000000 10182.9389
## 6             treeLikelihood.12TRNEFG  -2623.52080000000  9743.3822
## 7               treeLikelihood.1HKYGX  -1890.25990000000  6899.4901
## 8              treeLikelihood.2TRNEFG  -2929.34310000000 12578.4743
## 9              treeLikelihood.3TRNEFG  -2228.50880000000  5861.5717
## 10             treeLikelihood.4GTRIGX  -4751.31270000000  6264.6482
## 11              treeLikelihood.5K80IG  -5271.46200000000  6228.9323
## 12               treeLikelihood.6SYMG  -5716.34910000000  4898.4938
## 13               treeLikelihood.7K80G  -3531.85980000000 11009.0612
## 14               treeLikelihood.8K80G  -8747.22190000000  9898.7248
## 15              treeLikelihood.9HKYGX  -3649.08760000000 12514.7847
## 16                         TreeHeight    140.57470000000 10338.7332
## 17                       kappa.1HKYGX      3.28520000000 17000.8657
## 18                       kappa.5K80IG      3.62270000000 17036.5801
## 19                        kappa.7K80G      2.94240000000 17605.5156
## 20                        kappa.8K80G      3.17700000000 17936.8719
## 21                       kappa.9HKYGX      3.60440000000 16389.5960
## 22              mutationRate.10TRNEFG      1.19250000000 18001.0000
## 23                 mutationRate.11JCG      0.21120000000 15629.6236
## 24              mutationRate.12TRNEFG      2.55580000000 16866.0661
## 25                mutationRate.1HKYGX      1.19020000000 16513.1958
## 26               mutationRate.2TRNEFG      0.87520000000 18001.0000
## 27               mutationRate.3TRNEFG      0.29040000000 17979.9530
## 28               mutationRate.4GTRIGX      0.72620000000 16689.5174
## 29                mutationRate.5K80IG      0.79590000000 17143.3747
## 30                 mutationRate.6SYMG      1.63020000000 15606.1203
## 31                 mutationRate.7K80G      1.92350000000 18001.0000
## 32                 mutationRate.8K80G      1.05970000000 17199.8343
## 33                mutationRate.9HKYGX      0.59820000000 17452.6290
## 34                gammaShape.10TRNEFG      0.50610000000 16821.5527
## 35                   gammaShape.11JCG      0.39620000000 16721.0814
## 36                gammaShape.12TRNEFG      0.34310000000 16452.4442
## 37                  gammaShape.1HKYGX      1.02200000000 16724.6048
## 38                 gammaShape.2TRNEFG      0.68140000000 16994.3585
## 39                 gammaShape.3TRNEFG      0.47430000000 17008.1431
## 40                 gammaShape.4GTRIGX      0.45050000000 17015.2568
## 41                  gammaShape.5K80IG      0.32210000000 17592.5179
## 42                   gammaShape.6SYMG      0.52810000000 16585.4278
## 43                   gammaShape.7K80G      0.71830000000 17693.5402
## 44                   gammaShape.8K80G      1.23820000000 17200.9135
## 45                  gammaShape.9HKYGX      0.63260000000 17234.9393
## 46                    kappa1.10TRNEFG      4.15630000000 15527.0907
## 47                    kappa1.12TRNEFG      3.09130000000 16942.8632
## 48                     kappa1.2TRNEFG      4.81030000000 17363.6141
## 49                     kappa1.3TRNEFG      3.05280000000 16385.7003
## 50                    kappa2.10TRNEFG      6.07820000000 17057.8406
## 51                    kappa2.12TRNEFG      7.38440000000 15936.6179
## 52                     kappa2.2TRNEFG      1.80200000000 17202.8465
## 53                     kappa2.3TRNEFG      4.85190000000 17012.1082
## 54                     rateAC.4GTRIGX      0.19210000000  9855.2903
## 55                       rateAC.6SYMG      0.31930000000 14623.8651
## 56                     rateAG.4GTRIGX      0.47030000000  5502.4156
## 57                       rateAG.6SYMG      1.59790000000 12267.5820
## 58                     rateAT.4GTRIGX      0.11750000000 11479.6691
## 59                       rateAT.6SYMG      0.25070000000 14710.3109
## 60                     rateCG.4GTRIGX      0.37200000000  7986.6918
## 61                       rateCG.6SYMG      1.21620000000 13877.7986
## 62                     rateGT.4GTRIGX      0.16470000000 10089.2300
## 63                       rateGT.6SYMG      0.24380000000 14692.1310
## 64                           ucldMean      0.00060507000   520.6218
## 65                          ucldStdev      0.98660000000  3304.4547
## 66                          rate.mean      0.00059165000  1078.0868
## 67                      rate.variance      0.00000077131  1286.6491
## 68        rate.coefficientOfVariation      1.33610000000  1197.4815
## 69                                FBD   -792.19630000000  1206.5897
## 70             diversificationRateFBD      0.03560000000  3956.4622
## 71                        turnoverFBD      0.93880000000  2235.8560
## 72              samplingProportionFBD      0.00017288000  3241.7150
## 73                         SACountFBD      1.00000000000 14126.2958
## 74         mrcatime(Aipichthys_clade)    132.42040000000  2653.4347
## 75   mrcatime(Archaeotetraodon_clade)     50.37480000000   573.7681
## 76           mrcatime(Archaeus_clade)     68.72970000000  2402.5711
## 77      mrcatime(Berybolcensis_clade)     70.12580000000  4850.0053
## 78          mrcatime(Calatomus_clade)     42.70510000000   717.6381
## 79     mrcatime(Chaetodontidae_clade)     62.07220000000  2039.0593
## 80           mrcatime(Cretzeus_clade)    103.56940000000  1130.8002
## 81            mrcatime(Eastmanalepes)     59.69440000000  2429.0673
## 82           mrcatime(Eobothus_clade)     79.42710000000  1275.0543
## 83        mrcatime(Eobuglossus_clade)     54.32530000000  1773.8273
## 84        mrcatime(Eocoelopoma_clade)     57.95310000000  2866.4898
## 85  mrcatime(Gasterorhamphosus_clade)     89.90970000000  1616.5403
## 86     mrcatime(Homonotichthys_clade)    110.62100000000  2318.2603
## 87        mrcatime(Hoplopteryx_clade)    133.68180000000  2774.9714
## 88            mrcatime(Luvarus_clade)     75.62520000000   938.3603
## 89     mrcatime(Mahengechromis_clade)     58.03330000000  1665.9505
## 90    mrcatime(Massamorichthys_clade)     74.37210000000  3070.0687
## 91               mrcatime(Mene_clade)     68.75780000000  1833.9515
## 92      mrcatime(Proacanthurus_clade)     57.13970000000  3703.7034
## 93    mrcatime(Prosolenostomus_clade)     69.85620000000  1521.8176
## 94    mrcatime(Rhamphexocoetus_clade)     70.65030000000  1355.0746
## 95      mrcatime(Rhinocephalus_clade)     63.84230000000  1509.3286
## 96          mrcatime(Sphyraena_clade)     66.10570000000  2027.8272
## 97             mrcatime(Tarkus_clade)     55.33920000000   879.4581
## 98       mrcatime(Trachipterus_clade)     44.76320000000  1420.5267
## 99            mrcatime(Triodon_clade)     69.14390000000   563.6939
## 100          mrcatime(Turkmene_clade)     84.57110000000  1155.5939
## 101          mrcatime(Zenopsis_clade)     55.20110000000  2708.5902
## 102               logP(mrca(rootAge))      0.00000000000 25000.0000
## 103                 mrcatime(rootAge)    140.57470000000 10338.7332
## 104             freqParameter.1HKYGX1      0.38270000000 12390.6230
## 105             freqParameter.1HKYGX2      0.23050000000 14113.2549
## 106             freqParameter.1HKYGX3      0.15220000000 12582.0953
## 107             freqParameter.1HKYGX4      0.23310000000 13817.2306
## 108            freqParameter.4GTRIGX1      0.29890000000  8607.4667
## 109            freqParameter.4GTRIGX2      0.21820000000  7347.0180
## 110            freqParameter.4GTRIGX3      0.30880000000  8152.6108
## 111            freqParameter.4GTRIGX4      0.17280000000  7414.4529
## 112             freqParameter.9HKYGX1      0.21010000000 13315.3998
## 113             freqParameter.9HKYGX2      0.25910000000 13362.4276
## 114             freqParameter.9HKYGX3      0.30490000000 13011.9077
## 115             freqParameter.9HKYGX4      0.22500000000 13350.1196

As the effective sample sizes of all parameters exceeded 200, the posterior distribution of time trees was summarized using TreeAnnotator:

/home/linuxbrew/.linuxbrew/bin/treeannotator -burnin 50000000 -heights median 28-fossil-tip-dating-unif-root-prior.trees 28-fossil-tip-dating-unif-root-prior.tre
BEAST2 tip dating tree based on 12 calibrations and a uniform root prior

BEAST2 tip dating tree based on 12 calibrations and a uniform root prior

Comparisons of the posterior and the prior

These were performed for the uncorrelated lognormal node-dating BEAST analysis as well as for both the 12-fossil and 28-fossil tip-dating analyses. In the former case, 50,000 samples from the prior were obtained by running MCMC for 50 million generations with a sampling period of 1,000 generations; for the tip-dating analyses, 20,000 samples were used (500 million generations; 1 sample collected every 25,000 generations). The samples were obtained by re-running the original analysis with the corresponding option in BEAUti (“Sample from prior only – create empty alignment” in BEAUti 1.8 and “Sample From Prior” in BEAUti 2.4) using the following commands:

/home/linuxbrew/.linuxbrew/bin/beast -beagle_SSE -beagle_scaling dynamic BEAST-uncorr-lognorm-12-calib-prior.xml
java -jar lib/beast.jar -beagle_SSE -threads 8 12-tipdating-prior.xml
java -jar lib/beast.jar -beagle_SSE -threads 8 28-tipdating-prior.xml

All likelihood values were recorded as “NaN” in the tip-dating prior runs. Accordingly, the posterior corresponded exactly to the prior:

## [1] "Prior mean of the 12-fossil tip-dating prior sample: -670.859832953801"
## [1] "Posterior mean of the 12-fossil tip-dating prior sample: -670.859832953801"
## [1] "Prior mean of the 28-fossil tip-dating prior sample: -852.391311450328"
## [1] "Posterior mean of the 28-fossil tip-dating prior sample: -852.391311450328"

However, this was not the case for the UCLD node-dating analysis. An examination of the log file in Tracer revealed that the likelihood had oscillated about 0 for approximately 6 million generations (taking on both negative and positive values) before suddenly jumping up ~15,000 log units to a high positive value. Another such jump had occurred at ~38 million generations. The mean log posterior (by definition the normalized product of the prior and the likelihood) was therefore positive as well, rather than equal to the (negative) prior:

## [1] "Prior mean of the UCLD node-dating prior sample: -2070.21837205248"
## [1] "Posterior mean of the UCLD node-dating prior sample: 12429.5899812524"

However, since the corresponding XML file was checked and found to contain no data, this occurrence of non-zero and even positive likelihood values was treated as a peculiarity of the way BEAST 1.8 reports individual MCMC states that does not affect the validity of the analysis.

No burnin was specified when summarizing the distributions of the samples, since the chain started from the target distribution (i.e., the prior) instead of having to converge to it. This is supported by the trace observed in Tracer for both of the tip-dating analyses; however, in the node-dating analyses, the prior plummets from ~-1350 to ~-2250 log units at the same point when the likelihood rises from the proximity of 0, and oscillates about the latter value until the end of the chain. A pronounced second drop is observed at the 38-million generation mark (where the likelihood increases for a second time), but is rapidly followed by a return to the original value.

The plots below compare (1) the marginal (“induced”) prior (i.e., the FBD process conditioned on the age of the root and the monophyly constraints) and (2) the posterior for the tip-dating analyses and (1) the user-specified calibration densities, (2) the marginal (“induced”) prior obtained by multiplying these densities by the birth-death tree prior, and (3) the posterior for the node-dating analysis.

UCLD node-dating BEAST analysis: comparison of the user-specified (blue) and induced (green) prior

UCLD node-dating BEAST analysis: comparison of the user-specified (blue) and induced (green) prior

UCLD node-dating BEAST analysis: comparison of the user-specified priors (blue) and the posterior (red)

UCLD node-dating BEAST analysis: comparison of the user-specified priors (blue) and the posterior (red)

UCLD node-dating BEAST analysis: comparison of the induced prior (green) and the posterior (red)

UCLD node-dating BEAST analysis: comparison of the induced prior (green) and the posterior (red)

BEAST2 12-fossil tip-dating analysis: comparison of the induced prior (blue) and the posterior (red)

BEAST2 12-fossil tip-dating analysis: comparison of the induced prior (blue) and the posterior (red)

BEAST2 28-fossil tip-dating analysis: comparison of the induced prior (blue) and the posterior (red)

BEAST2 28-fossil tip-dating analysis: comparison of the induced prior (blue) and the posterior (red)

Refs:

Heath T 2016 Divergence time estimation using BEAST v2.: dating species divergences with the fossilized birth-death process. http://phyloworks.org/workshops/DivTime_BEAST2_tutorial_FBD.pdf (Accessed June 12, 2017)

Lepage T, Bryant D, Philippe H, Lartillot N 2007 A general comparison of relaxed molecular clock models. Mol Biol Evol 24(12): 2669–80

Ronquist F, Deans AR 2010 Bayesian phylogenetics and its influence on insect systematics. Annu Rev Entomol 55: 189–206

Warnock RCM, Parham JF, Joyce WG, Lyson TR, Donoghue PCJ 2014 Calibration uncertainty in molecular dating analyses: there is no substitute for the prior evaluation of time priors. Proc R Soc B 282(1798): 20141013